Cost Management

Next-gen FinOps for the AI era

Wed, 22 Apr 2026 12:00:00 +0000

Today we’re excited to announce the next generation of our FinOps product suite to help our customers increase operational efficiency, better understand their costs, and control them with Spend Caps.

What’s new: We’re introducing a new FinOps Explainability agent, which is designed to operate autonomously, and investigate the drivers of your AI-related Cloud costs. This is in addition to new FinOps tooling which provides commercial auditability.

We’re also announcing a private preview of Spend Caps in Google Cloud, enabling FinOps and DevOps managers to set budgets and enforce cost boundaries at the project level for Google AI Studio (AIS), Gemini Enterprise Agent Platform (the evolution of Vertex AI) , Cloud Run, Cloud Run Functions, and Maps. These caps alert and ultimately pause API traffic once your set budget is reached.

Why it matters for your business: These new FinOps tools give you clear visibility into AI costs, increase control with Spend Caps to prevent overspending, and offer the commercial flexibility needed to scale your AI innovations efficiently. Customers who are using our existing FinOps tools are seeing huge improvements. Since launching Gemini Cloud Assist (GCA) for FinOps last year, cost reporting adoption has surged 75% while simultaneously slashing customer time spent doing FinOps cost analysis by 18%.

Where to get started: You can access the FinOps Explainability Agent in the console here, along with the FinOps tooling. Customers can sign-up for the private preview of Spend Caps here.

Goodbye static reporting

Given the number of variables and the number of services commonly running for a large enterprise, cloud cost reporting can be noisy. Even when, in theory, AI costs are just the result of quantity (q) times price (p). Quantity can be driven by a large mix of variables such as API request traffic, error logs, fluctuating token counts, or even cloud storage. And price often fluctuates with different AI model types, and frequent provider price shifts. This challenges FinOps and DevOps managers to synthesize this data to identify efficiency opportunities, or take timely action.

In Google Cloud Billing we used Gemini to develop our new FinOps Explainability agent to autonomously help users understand the drivers of AI costs. Attributing ROI to AI projects requires a clear understanding of its costs, but because AI often piggybacks on existing infrastructure, its expenses frequently blur into the general cost of doing business.

Now you can use the FinOps Explainability agent to identify your AI cost drivers automatically, and use it to answer questions like: “How much did I spend on Gemini 1.5 Pro versus Gemini 1.5 Flash?” Or, “Break down my total spend by API Key so I can see which integration is expensive.” Or, “Show me the split between Input Token costs and Output Token costs for Gemini 3.0 Pro.” Users can quickly discover what services and which projects are driving your AI costs.

FinOps Explainability agent helps you analyze AI costs, drivers & trends

Hello automated Spend Caps

The speed of AI adoption and usage is driving cloud spend that behaves differently than traditional cloud spend. AI uses specialized hardware (TPU/GPUs) and a single runaway training job or unoptimized model running on that hardware can drain a budget in a very short amount of time. Users are also constantly experimenting. Traditional cost control tools typically alert managers, but don’t enforce budget caps. The result: many enterprises have been forced to build their own complex custom spend guardrails that are enforced through destructive actions that may be time consuming to adjust, such as disassociating forms of payment.

We’re excited to announce that Spend Caps are coming soon to Google Cloud. Designed to work with Google Cloud Budgets, FinOps and DevOps can set budgets that enforce automated cost boundaries (caps) at the project level for AIS, Agent Platform, Cloud Run, Cloud Run Functions, and Maps. These caps alert and ultimately pause API traffic once your set budget is reached, but leave your resources intact. If you need the traffic to resume, simply suspend the Spend Cap.

We expect customers that want to contain the costs of AI R&D to benefit immensely from this new feature. You can sign up for the private preview today.

Spend Caps help prevent cost overruns

Real commercial incentive auditability.

Google Cloud meets you at every stage of growth—offering the commercial flexibility, startup programs, and enterprise incentives needed to help your costs scale efficiently. To help users more clearly understand the connection between commercial agreements and the services being billed, we’ve designed our FinOps tooling to provide end-to-end auditability of our commercial obligations. With the private preview rollout of enhanced billing account hierarchies, customers can view their aggregated spend across multiple billing accounts, including Other Eligible Services (OES) spend. Additionally, we are announcing a private preview for Google Cloud contract commitment reporting, providing visibility into Google Cloud commit contract burndown within your Enterprise Agreement.

The future of FinOps is here. Built with AI for AI.

With the FinOps Explainability agent for deep visibility, Spend Caps for increased control, and enhanced billing account hierarchies with contract commitment reporting for ultimate commercial flexibility, Google Cloud is empowering you to scale your AI innovations with confidence and precision.

How to find the sweet spot between cost and performance

Mon, 13 Apr 2026 16:00:00 +0000

At Google Cloud, we often see customers asking themselves: "How can we manage our generative AI costs effectively without sacrificing the performance and availability our applications demand?"

This is the million-dollar question — or, perhaps more accurately, the "tokens-per-minute" question. The key isn't just about choosing the cheapest option, but about finding the right recipe of tools and services that aligns with your workload patterns.

This guide will walk you through Google Cloud's flexible gen AI infrastructure options, showing you how to find that sweet spot on the efficient frontier between cost and performance. We'll start with the foundational pay-as-you-go (PayGo) models and then explore how to layer on more specialized options to build a robust and cost-effective gen AI strategy.

Understanding your foundation: Pay-as-You-Go (PayGo) options

For many workloads, Google Cloud's standard PayGo offerings provide a powerful and flexible starting point. To get the most out of them, it's crucial to understand the mechanisms that govern performance and availability.

1. Dynamic Shared Quota (DSQ)

At its core, the standard PayGo environment operates on a principle of fairness and efficiency called Dynamic Shared Quota (DSQ). Instead of enforcing rigid, per-customer limits, DSQ intelligently distributes available GenAI capacity among all customers.

How it works:

High-priority lane: Your organization has a default Tokens Per Second (TPS) threshold. Any requests you send that fall within this threshold are given higher priority. This lane is designed to provide high availability, targeting a 99.5% SLO.
Best-effort lane: If you experience a spike in traffic and exceed your TPS threshold, your excess requests are not immediately dropped. Instead, they are handled with lower priority, receiving throughput when there is spare capacity available.

This system is designed so that sudden traffic spikes from one customer do not negatively impact the baseline performance of others. You get a reliable level of service for your everyday needs, with the potential to burst when the system has capacity to spare.

2. Usage tiers: Rewarding your investment

To provide more predictable performance as your gen AI usage grows, Google Cloud automatically places your organization into Usage Tiers based on your rolling 30-day spend on eligible Vertex AI services. The higher your tier, the higher your guaranteed Tokens Per Minute (TPM) limit.

At the time of this article, these are the tiers for our popular model families:

Model Family	Tier	Spend (30 days)	TPM
Pro Models	Tier 1	$10 - $250	500,000
	Tier 2	$250 - $2,000	1,000,000
	Tier 3	> $2,000	2,000,000
Flash / Flash-Lite Models	Tier 1	$10 - $250	2,000,000
	Tier 2	$250 - $2,000	4,000,000
	Tier 3	> $2,000	10,000,000

^{Important: For the most updated model and threshold please always refer to the documentation}

Crucially, you should think of your tier limit as a floor, not a ceiling.

Critical traffic: Traffic up to your organization's tier limit is protected. You should experience minimal to no 429 (resource exhausted) errors as long as you stay within this baseline.
Opportunistic bursting: When you exceed your tier limit, you can still burst to use spare system capacity on a best-effort basis. If the entire system is under heavy load, fair-share throttling will engage for this excess traffic. The key takeaway is that we don't artificially cap your performance if there's idle capacity available.

3. Priority PayGo: Your insurance policy for spikes

What if your workload is prone to unpredictable spikes and you can't risk 429 errors, but you're not ready to commit to a fixed capacity model? This is where Priority PayGo comes in. It's designed to give you the best of both worlds: the flexibility of PayGo with the high availability needed for important traffic.

For a premium, you can tag specific API requests for higher priority.

Important: Please note that the Priority PayGo feature is currently available only for the global endpoint. Future release on regional endpoints might happen but is not guaranteed.

How to use Priority PayGo:It's as simple as adding a header to your API call. No sign-up or commitment is needed.

code_block: <ListValue: [StructValue([('code', 'curl -X POST \\\r\n -H "Authorization: Bearer $(gcloud auth print-access-token)" \\\r\n -H "Content-Type: application/json" \\\r\n -H "X-Vertex-AI-LLM-Shared-Request-Type: priority" \\\r\n https://aiplatform.googleapis.com/...'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f7dcaf5bb50>)])]>

Be mindful of the ramp limit. As the images below illustrate, ramping up priority requests too quickly can cause some requests to be downgraded to standard priority if capacity is constrained. A slower, more gradual ramp-up ensures the best experience and mitigates downgrading.

For example:

System tries to serve priority requests even when they are above the ramp limit, however they are subject to downgrading (not throttling) when capacity is constrained

Ramping priority requests within the limit mitigates downgrading and ensures good experience

You can monitor your utilized Priority PayGo request following this documentation

For the uncompromising workload: Provisioned Throughput (PT)

When your gen AI workload is absolutely business-critical and you need an explicit availability guarantee, it's time to consider PT.

With PT, you reserve a specific amount of model processing capacity for a fixed monthly cost. This is the only way to get an availability SLA. While a standard PayGo model has an uptime SLA (the model is up), PT provides an availability SLA (your requests will be processed).

Let’s deep dive a little bit in more detail by the definition of “error rate”: the number of Valid Requests that result in a response with HTTP Status 5XX and Code "Internal Error" divided by the total number of Valid Requests during that period, subject to a minimum of 2000 Valid Requests in the measurement period.

While standard PAYG returns 429 in case of “Resource exhausted” resulting on the call not being count in the error rate , for standard Provisioned Throughput, when you use less than your purchased amount, errors that might otherwise be 429 are returned as 5XX and count toward the SLA error rate. This is what defines the SLA difference between PT and PAYG.

This makes Provisioned Throughput the ideal choice for:

Large, predictable production workloads.
Applications with strict performance requirements where throttling is not an option.

Fine-grained control over your PT requests

By default, any usage above your PT order automatically spills over to PAYG. However, you can control this behavior at the request level using HTTP headers:

Prevent overages: To ensure you never exceed your PT commitment and deny any excess requests, add the dedicated header. This is useful for strict budget control.

code_block: <ListValue: [StructValue([('code', '{"X-Vertex-AI-LLM-Request-Type": "dedicated"}'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f7dcaf5b340>)])]>

Bypass PT on-demand: To intentionally send a lower-priority request to the PayGo pool even though you have a PT order, use the shared header. This is perfect for experimenting or running non-critical jobs without consuming your reserved capacity.

code_block: <ListValue: [StructValue([('code', '{"X-Vertex-AI-LLM-Request-Type": "shared"}'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f7dcaf5b2e0>)])]>

Monitoring your investment

You can closely monitor your Provisioned Throughput usage using Cloud Monitoring metrics on the aiplatform.googleapis.com/PublisherModel resource. Key metrics include:

/dedicated_gsu_limit: Your dedicated limit in Generative Scale Units (GSUs).
/consumed_token_throughput: Your actual throughput usage, accounting for the model's burndown rate.
/dedicated_token_limit: Your dedicated limit measured in tokens per second.

This allows you to ensure you are getting the value you paid for and helps you right-size your commitment over time. To learn more about PT on Vertex AI, visit our guide here.

Building your recipe: Combining options for optimal results

Consider a workload with a predictable daily baseline, expected peaks, and the occasional unexpected spike. The optimal recipe would be:

Provisioned Throughput: Cover your predictable, mission-critical baseload. This gives you an availability SLA for the core of your application.
Priority PayGo: Use this to handle predictable peaks that rise above your PT commitment or for important traffic that is less frequent. This acts as a cost-effective insurance policy against 429 errors for your most important variable traffic.
Standard PayGo (within tier limit): This forms your foundation for general, non-critical traffic that fits comfortably within your organization's usage tier.
Standard PayGo (opportunistic bursting): For non-critical, latency-insensitive jobs (like batch processing), you can rely on the best-effort bursting of the standard PayGo model. If some of these requests are throttled, it won't impact your core user experience, and you don't pay a premium for them.

By understanding and combining these powerful tools, you can move beyond simply managing costs and start truly optimizing your GenAI strategy for the perfect balance of performance, availability, and value.

Extra bonus: Batch API and Flex PayGo

Starting with the Batch API, not every LLM request needs a sub-second time-to-first-token (TTFT). If a user is chatting with a customer service bot, low latency is critical. But if you are classifying millions of support tickets from last month, running evaluations, or generating daily summary reports, nobody is sitting at a screen waiting for a real-time stream. This is where the Gemini Batch API becomes your best friend. Customers can bundle up a massive payload of requests into a single file and submit it asynchronously. The infrastructure processes these workloads during off-peak windows or when idle compute capacity is available. The target turnaround time is 24 hours, though in practice, it is typically much faster. By trading immediate execution for asynchronous processing, you get a 50% discount on standard token costs.

While Batch handles your offline heavy lifting, your live apps still need real-time computation. But not all requests are latency-driven and customers might accept to wait a little longer to get a discount on the standard token costs. Flex PayGo provides a highly cost-effective way to access Gemini models, offering a 50% discount compared to Standard PayGo. Optimized for non-critical workloads that can accommodate response times of up to 30 minutes, it allows for seamless transitions between Provisioned Throughput (PT), Standard PayGo, and Flex PayGo with minimal code changes. Ideal use cases include:

Offline analysis of text and multimodal files.
Model quality evaluation and benchmarking.
Data annotation and labeling.
Automated product catalog generation.

Get started

Explore the Models in Vertex AI: Discover the full range of Google's first-party models as well as over 100 open-source models available in the Model Garden
Dive deeper into the documentation: For the most up-to-date technical details, thresholds, and code samples, the official Vertex AI documentation is your source of truth.
Review pricing details: Get a detailed breakdown of token costs, Provisioned Throughput pricing, and the latest discounts for Batch and Flex APIs on the Vertex AI pricing page.

Simpler billing, clearer savings: A FinOps guide to updated spend-based CUDs

Thu, 12 Feb 2026 17:00:00 +0000

Optimizing cloud spend is one of the most rewarding aspects of FinOps — and committed use discounts (CUDs) remain one of the most effective levers to pull.

In July 2025, we began rolling out updates to the spend-based CUD model to make it easier to understand your costs and savings, expand coverage to new SKUs (including Cloud Run and H3/M-series VMs), and offer increased flexibility. These changes are now available to all customers. Let’s dive into how this new model simplifies your FinOps practice.

1. What is the spend-based CUD data change all about?

The most important shift is the move from a credit-based system to a direct discounted price model using consumption models.

Under the old credits model, you committed to an hourly on-demand amount. To find your savings (the actual cost reduction realized), you had to use three different numbers: the full on-demand cost, the commitment fee, and the offsetting credit.

1. The old math:

$10.00 (On-demand) + $5.50 (Commitment fee) - $10.00 (Credit) = $5.50 (Net Cost)
Savings = $10.00 (On-demand) - $5.50 (Net costs) = $4.50

With the new direct discount model, you don’t need to do that math to calculate your net costs. You commit directly to the net, discounted spend amount. Your usage is simply billed at that discounted rate.

2. The new math:

$5.50 (Discounted costs)
Savings = $10.00 (On-demand) - $5.50 (Discounted costs) = $4.50

You can now see your net cost at a glance, and calculating the savings only requires comparing the on-demand price ($10.00) to your new discounted cost ($5.50), which equals $4.50/hr.

2. How do I validate my savings before and after the changes?

The unified CUD Analysis tool is your best resource for auditing the migration or performing deep-dives on your spend. CUD Analysis for the new spend-based CUD model allows you to quickly verify the savings you are getting with the new model, and you can use this tool to compare that the savings didn’t change between the old and the new model.

You can validate your savings by following these steps:

1. Identify the date when the migration took place; you can see the migration date in the billing overview page.

2. Go to CUD Analysis to validate the savings before and after the migration.

3. To quantify costs from before the migration:

Filter the view for one day before the migration, in this case Oct. 26, 2025.
Select a CUD Product, for example Cloud SQL CUD.
In our example, we paid a $50.35 CUD fee to get a $69.12 credit. When you subtract that fee from the credit, your actual take-home savings were $18.77.

4. To validate costs after the migration

Change the date to Oct. 28, 2025
Under the new model, you pay the discounted rates upfront. Your dashboard will reflect a Net Cost of $50.35, compared to the $69.12 on-demand cost, clearly showing your $18.77 in savings.

In addition, this release also includes an update to Cost Reports to include “Savings Programs,” which accurately reflects your actual net savings ($18.77 in our example above), rather than gross credit. When comparing pre- and post-migration data in Cost Reports, ensure you include both usage SKUs and commitment fee SKUs to capture the full scope of the commitment.

3. What other capabilities are in the new CUD Analysis?

Beyond support for the new model, the new CUD Analysis tool offers deeper visibility into your CUD coverage and CUD utilization. You can now analyze your CUDs with hourly data granularity for up to 30 days. This is a major improvement for FinOps teams, as daily averages often hide underutilization spikes that occur during specific hours.

CUD Analysis: Compute Flexible CUD coverage analysis

CUD Analysis: Per CUD purchase utilization visibility

If you want to use your own data analysis tools, we offer a new spend-based CUD metadata export that lets you manage your spend-based CUDs programmatically. You can use this export to join with the Billing BigQuery Export datasets to run in-depth, programmatic analysis on all your commitment data. You can also export a CSV from the CUD Analysis view to see the raw data for every resource and its price without needing the full BigQuery export.

4. How much commitment should I buy?

Our CUD recommendations are the primary tool for determining how much of a commitment to purchase. We recently enhanced our Compute Flexible CUD commitment recommendations to provide greater accuracy by including data from GKE, Cloud Run, Cloud Run Functions, and Compute Engine. Additionally, CUD scenario modeling allows you to adjust these suggestions in real-time. You can adjust coverage thresholds, filter out specific dates with irregular usage, or extend the lookback analysis window up to 180 days to identify the exact commitment level that aligns with your specific risk profile.

CUD scenario modeling: experiment with multiple options to identify your ideal CUD strategy

5. Is there anything else I should know about Flex CUDs?

With the release of the new spend-based model, we’ve addressed the reporting limitation affecting customers who use a combination of Flex CUDs and GKE/Cloud Run CUDs. Previously, our analysis tools were unable to accurately identify the source of specific credits, leading to discrepancies in KPI metrics like savings, coverage, and utilization. Under the new spend-based CUD model, this limitation has been corrected, so your CUD analysis now provides an accurate, granular view of your savings per Google Cloud service.

To begin navigating the updated spend-based model, visit the Billing console. You can learn more in our documentation:

Automating FinOps cost management policies using Workload Manager

Tue, 04 Nov 2025 17:00:00 +0000

Do you find yourself battling surprise cloud bills? Do you spend more time tracking down un-tagged resources and chasing development teams than you do on strategic financial planning? In the fast-paced world of cloud, manual cost management is a losing game. It’s time-consuming, prone to errors, and often, by the time you’ve identified a cost anomaly, it's too late to prevent the impact.

What if you could codify your financial governance policies and automate their enforcement across your entire Google Cloud organization? Enter Workload Manager (WLM), a powerful tool that lets you automate the validation of your cloud workloads against best practices for security and compliance, including your own custom-defined FinOps rules. Better yet, we recently slashed the cost of using Workload Manager by up to 95% for certain scenarios, letting you run large-scale scans more economically, including a small free tier to help you run small-scale tests. In this blog, we show you how to get started with automated financial governance policies in Workload Manager, so you can stop playing catch-up and start proactively managing your cloud spend.

The challenge with manual FinOps

Managing business-critical workloads in the cloud is complex. Staying on top of cost-control best practices is a significant and time-consuming effort. Manual reviews and audits can take weeks or even months to complete, by which time costs can spiral. This manual approach often leads to "configuration drift," where systems deviate from your established cost management policies, making it difficult to detect and control spending.

Workload Manager helps you break free from these manual constraints by providing a framework for automated, continuous validation, helping FinOps teams to:

Improve standardization: Decouple team dependencies and drive consistent application of cost-control policies across the organization.
Enable ownership: Empower individual teams to build and manage their own detection rules for specific use cases, fostering a culture of financial accountability.
Simplify auditing: Easily run infrastructure checks across your entire organization and consolidate the findings into a single BigQuery dataset for streamlined reporting and analysis.

By codifying your FinOps policies, you can define them once and run continuous scans to detect violations across your entire cloud environment on a regular schedule.

Workload Manager makes this easy, providing you with out-of-the-box rules across Security, Cost, Reliability etc. Here are some examples of FinOps cost management policies that can be automated with Workload Manager:

Must have required label or tag for a specific google cloud resource (eg: BigQuery dataset)
Enforce lifecycle management or autoclass configuration for every cloud storage bucket
Ensure appropriate data retention is set for storage (eg: BigQuery tables)
Disable simultaneous multi-threading to optimize licensing costs (eg: SQL Server)

Figure - 1: Default Workload Manager policies as per Google Cloud best practices

Don't find what you need? You can always build your own custom policies using examples in our Git repo.

Let’s take a closer look.

Automating FinOps policies: A step-by-step guide

Here’s how you can use Workload Manager to automate your cost management policies.

Step 1: Define your FinOps rules and create a new evaluation

First, you need to translate your cost management policies into a format that the Workload Manager can understand. The tool uses Open Policy Agent (OPA) Rego for defining custom rules. In this blog we will take a primary use case for FinOps — that is, to ensure resources are properly labeled for cost allocation and showback.

You can choose from hundreds of predefined rules authored by Google Cloud experts that cover FinOps, reliability, security, and operations according to the Google Cloud best practices or create and customize your own rules (checkout examples from the Google Cloud GitHub repository). In our example we will use one of the predefined ‘Google Cloud Best Practices’ rules for bigquery-missing-labels on a dataset. In this case, navigate to the Workload Manager section in your Google Cloud Console and start by creating a new evaluation.

Give your evaluation a name and select "Custom" as the workload type. This is where you can point Workload Manager to the Cloud Storage bucket that contains your custom FinOps rules if you’ve built one. The experience allows you to run both pre-defined and custom rule checks in one evaluation.

Figure 2 - Creating new evaluation rule

Step 2: Define the scope of your scan

Next, define the scope of your evaluation. You have the flexibility to scan your entire Google Cloud organization, specific folders, or individual projects. This allows you to apply broad cost-governance policies organization-wide, or create more targeted rules for specific teams or environments. You can also apply filters based on resource labels or names for more granular control. In this example, region selection lets you select where you want to process your data to meet data residency requirements.

Figure 3 - Selecting scope and location for your evaluation rule

Step 3: Schedule and notify

With FinOps, automation is key. You can schedule your evaluation to run at a specific cadence, from hourly to monthly. This helps ensure continuous monitoring and provides a historical record of your policy compliance. Optionally, but highly recommended for FinOps, you can configure the evaluation to save all results to a BigQuery dataset for historical analysis and reporting.

You can also set up notifications to alert the right teams when an issue is found. Channels include email, Slack, PagerDuty, and more, so that policy violations can be addressed promptly.

Figure 4 - Export, schedule and notify evaluation rules

Step 4: Run, review, and report

Once saved, the evaluation will run on your defined schedule, or you can trigger it on-demand. The results of each scan are stored, providing a historical view of your compliance posture

From the Workload Manager dashboard, you can see a summary of scanned resources, issues found, and trends over time. For deeper analysis, you can explore the violation data directly in the BigQuery dataset you configured earlier.

Figure - 5: Checkout evaluations for workload manager

Visualize findings with Looker Studio

To make the data accessible and actionable for all stakeholders, you can easily connect your BigQuery results to Looker Studio. Create interactive dashboards that visualize your FinOps policy violations, such as assets missing required labels or resources that don't comply with cost-saving rules. This provides a clear, at-a-glance view of your cost governance status.

You can find Looker Studio template in template gallery and easily connect it with your datasets and modify as needed. Here is how you can use it:

Go to Looker studio.
Navigate to Templates and under Bigquery, select Google Cloud Workload Manager
Click on “Use your own Data” that asks for connecting the Bigquery table generated in previous steps.
After you have connected the Bigquery dataset, lick on Edit to create a customizable copy to incorporate any changes or share it with your team.

Figure - 6: Set up preconfigured Looker Studio dashboard for reporting

Take control of your cloud costs today

Stop the endless cycle of manual cloud cost management. With Workload Manager, you can embed your FinOps policies directly into your cloud environment, automate enforcement, and provide teams with the feedback they need to stay on budget.

Ready to get started? Explore the sample policies on GitHub and check out the official documentation to begin automating your FinOps framework today, and take advantage of Workload Manager’s new pricing.

Check out a quick overview video on how Workload Manager Evaluations helps you do a lot more across Security, Reliability and FinOps.

Then, review the updated pricing to learn more.

Announcing the General Availability of Smarter, AI-powered Cost Anomaly Detection

Mon, 03 Nov 2025 17:00:00 +0000

Last year, we announced the public preview of Cost Anomaly Detection, an AI-powered product designed to eliminate one of the biggest anxieties of using the Cloud: unexpected costs. The goal was to provide a safety net that automatically identifies unusual spikes in spending, helping you catch issues before they become financial problems.

Today, we are excited to announce that Cost Anomaly Detection is now generally available (GA), and it is more proactive, intelligent, and flexible. Best of all, anomaly alerts are now on by default for every customer across all projects, including the new ones, offering complete protection from day one.

What’s new in general availability?

For the GA release, we focused on making the service smarter, more automatic, more proactive, and more customizable to suit your specific needs. Here’s what’s new:

1. Auto-alerts by default

Insights into any deviations in your cloud costs should be the default. Protection from cost overruns should be constant and not require any configuration from your end. That's why we’ve automatically enabled anomaly alerts for all customers on all their projects. Default alerts will be sent to Billing Administrators; you can, of course, easily visit the billing console to manage and customize your alert preferences at any time. The alerts will take you to the Anomaly dashboard on the billing console, where you can easily see all the details related to the cost spike including the root causes.

Anomaly Dashboard with Root Cause Analysis

Default alert configuration

2. Intelligent, AI-generated thresholds

Will auto-alerts mean more noise and email spam? No. Our improved algorithm now provides automated, AI-generated anomaly thresholds based on your historical spending patterns. This intelligent baseline ensures you are only alerted to spikes that seem significant and unexpected, relative to your spend behavior.

Default threshold configuration

And while the AI-generated thresholds work out of the box, you still have the flexibility to override them with your own custom values, if needed. Customers who have already configured their own custom values but would like to leverage our AI-generated thresholds, can easily do so from the billing console at any time.

3. More flexible filtering with percentage deviation

We heard your feedback that every project has a different sensitivity to cost spikes. A $100 deviation might be critical for a small project but expected noise for a large one. To address this, we’ve introduced an additional threshold for percentage deviation that filters your anomaly dashboard and alerts not only on an absolute dollar value but also on a percentage change. This allows your alerts to stay relevant to your budget and scale.

Custom threshold configuration

Don't worry — all anomalies are still captured and can be viewed at any time by simply removing the filters from your dashboard.

4. Immediate protection from day one

During the public preview, we offered anomaly detection only on projects that were at least 6 months old due to lack of significant spend history. However, our improved algorithm now solves this "cold start" problem, making it possible to alert on anomalies even for new accounts and projects with no prior spend history. This helps ensure that you are protected on Google Cloud, from the get go.

Get started today

Cost Anomaly Detection is a core part of our FinOps capabilities that provides you with complete and predictable control over your cloud costs. When layered with Cloud Budgets, it creates a robust cost control strategy that works to prevent, detect, and contain runaway spend. And it remains free, offered as part of our comprehensive set of cost management tools.

Head over to your billing console to access this product and refer to our documentation for more details.

Three-part framework to measure the impact of your AI use case

Thu, 11 Sep 2025 16:00:00 +0000

Generative AI is no longer just an experiment. The real challenge now is quantifying its value. For leaders, the path is clear: make AI projects drive business growth, not just incur costs. Today, we'll share a simple three-part plan to help you measure the effect and see the true worth of your AI initiatives.

This methodology connects your technology solution to a concrete business outcome. It creates a logical narrative that justifies investment and measures success.

1. Define what success looks like (the value)

The first step is to define the project's desired outcome by identifying its "value drivers." For any AI initiative, these drivers typically fall into four universal business categories:

Operational efficiency & cost savings: This involves quantifying improvements to core business processes. Value is measured by reducing manual effort, optimizing resource allocation, lowering error rates in production or operations, or streamlining complex supply chains.
Revenue & growth acceleration: While many organizations initially focus on efficiency, true market leadership is achieved through growth. This category of value drivers is the critical differentiator, as it focuses on top-line impact. Value can come from accelerating time-to-market for new products, identifying new revenue streams through data analysis, or improving sales effectiveness and customer lifetime value.
Experience & engagement: This captures the enhancement of human interaction with technology. It applies broadly to improving customer satisfaction (CX), boosting employee productivity and morale with intelligent tools (EX), or creating more seamless partner experiences.
Strategic advancement & risk mitigation: This covers long-term competitive advantages and downside protection. Value drivers include accelerating R&D cycles, gaining market-differentiating insights from proprietary data, strengthening operational resiliency, or ensuring regulatory compliance and reducing fraud.

aside_block: <ListValue: [StructValue([('title', 'Try Google Cloud for free'), ('body', <wagtail.rich_text.RichText object at 0x7f7dc81a7520>), ('btn_text', ''), ('href', ''), ('image', None)])]>

2. Specify what it costs to succeed (your investment)

The second part of the framework demands transparency regarding the investment. This requires a complete view of the Total Cost of Ownership (TCO), which extends beyond service fees to include model training, infrastructure, and the operational support needed to maintain the system. For a detailed guide, we encourage a review of our post, How to calculate your AI costs on Google Cloud.

3. State the ROI

This is the synthesis of the first two steps. The ROI calculation makes the business case explicit by stating the time required to pay back the initial investment and the ongoing financial return the project will generate.

The framework in action: An AI chatbot for customer service

Now, let's apply the universal framework to a specific use case. Consider an e-commerce company implementing an AI chatbot. Here, the four general value drivers become tailored to the world of customer service.

Step 1: Define success (the value)
The team uses the customer-service-specific quadrants to build a comprehensive value estimate.

Quadrant 1: Operational efficiency
- Reduced agent handling time: By automating 60% of routine inquiries, the company frees up thousands of agent hours. This enables agents to serve more customers or perhaps provide better quality service to premium customers.
  - Estimated hours saved: ~725 hrs (lets say this equate to $15,660 in value)
- Lower onboarding & training costs: New agents become productive faster as the AI handles the most common questions, reducing the burden of repetitive training.
  - Estimated monthly value: $1,000
Quadrant 2: Revenue growth
- 24/7 Sales & support: The chatbot assists customers and captures sales leads around the clock, converting shoppers who would otherwise leave.
  - Estimated mMonthly vValue: $5,000
- Improved customer retention: Faster resolution and a better experience lead to a small, measurable increase in customer loyalty and repeat purchases.
  - Estimated monthly value: $1,000
Quadrant 3: Customer and employee experience
- Enhanced agent experience & retention: Human agents are freed from monotonous tasks to focus on complex, rewarding problems. This improves morale and reduces costly agent turnover.
  - Estimated monthly value: $500
Quadrant 4: Strategic enablement
- Expanding business to more languages: Enabling human agents to provide support in 15+ additional languages, thanks to the translation service built into the system.
  - Estimated revenue increase: $1,750
  - Total estimated monthly value = $15,660 + $1,000 + $5,000 + $1,000 + $500 + $1,750 = $24,910

Step 2: Define the cost (the investment)
Following a TCO analysis from our earlier blog post, we calculated the total ongoing monthly cost for the fully managed AI solution on Google Cloud would be approximately $2,700.

Step 3: State the ROI
The final story was simple and powerful. With a monthly value of around $25,000 and a cost of only $2,700, the project generated significant positive cash flow. The initial setup cost was paid back in less than two weeks, securing an instant "yes" from leadership.

Get started

Introducing no-cost, multicloud Data Transfer Essentials for EU and U.K. customers

Wed, 10 Sep 2025 05:00:00 +0000

At Google Cloud, our services are built with interoperability and openness in mind to enable customer choice and multicloud strategies. We pioneered a multicloud data warehouse, enabling workloads to run across clouds. We were the first company to provide digital sovereignty solutions for European governments and to waive exit fees for customers who stop using Google Cloud.

We continue this open approach with the launch today of our new Data Transfer Essentials service for customers in the European Union and the United Kingdom. Built in response to the principles of cloud interoperability and choice outlined in the EU Data Act, Data Transfer Essentials is a new, simple solution for data transfers between Google Cloud and other cloud service providers. Although the Act allows cloud providers to pass through costs to customers, Data Transfer Essentials is available today at no cost to customers.

Designed for “in-parallel” processing of workloads belonging to the same organization that are distributed across two or more cloud providers, Data Transfer Essentials enables you to build flexible, multicloud strategies and use the best-of-breed solutions across different cloud providers. This can foster greater digital operational resilience – without incurring outbound data transfer costs from Google Cloud.

To get started, please read our configuration guide to learn how to opt in and specify your multicloud traffic. Qualifying multicloud traffic will be metered separately, and will appear on your bill at a zero charge, while all other traffic will continue to be billed at existing Network Service Tier rates.

The original promise of the cloud is one that is open, elastic, and free from artificial lock-ins. Google Cloud continues to embrace this openness and the ability for customers to choose the cloud service provider that works best for their workload needs. Read more about Data Transfer Essentials here.

Save more with expanded coverage for Compute Flex CUDs

Fri, 05 Sep 2025 16:00:00 +0000

We’re excited to announce an expansion to our Compute Flexible Committed Use Discounts (Flex CUDs), providing you with greater flexibility across your cloud environment. Your spend commitments now stretch further and cover a wider array of Google Cloud services and VM families, translating into greater savings for your workloads.

Flex CUDs are spend-based commitments that provide deep discounts on Google Cloud compute resources in exchange for a one or three-year term. This model offers maximum flexibility, automatically applying savings across a broad pool of eligible VM families and regions without being tied to a single resource.

More power, more savings with expanded coverage

We understand that modern applications are built on a diverse mix of services, from massive databases to nimble serverless functions. To better support the way you build, we’re expanding Flex CUDs to cover more of the specialized solutions and serverless solutions you use every day:

Memory-optimized VM Families: We’re bringing enhanced discounts to our memory-optimized M1, M2, M3 and the new M4 VM families. Now you can get more value from critical workloads like SAP HANA, in-memory analytics platforms and high-performing databases.
High-performance computing (HPC) VM families: For compute-intensive workloads, Flex CUDs now apply to our HPC-optimized H3 and the new H4D VM families, perfect for complex simulations and scientific research.
Cloud Run and Cloud Functions: For developers and organizations that use Cloud Run's fully managed platform, we are extending Flex CUDs’ coverage to Cloud Run request-based billing and Cloud Run functions.

aside_block: <ListValue: [StructValue([('title', '$300 in free credit to try Google Cloud infrastructure'), ('body', <wagtail.rich_text.RichText object at 0x7f7dc9039640>), ('btn_text', ''), ('href', ''), ('image', None)])]>

Why this matters

This expansion of Compute Flex CUDs is designed with your growth and efficiency in mind:

Maximize your spend commitments: Instead of being tied to a specific resource type or region, your committed spend can now be applied across a larger portion of your Google Cloud usage. This means less "wasted" commitment and more active savings.
Enhanced financial predictability and control: With greater coverage, you gain a clearer picture of your anticipated cloud spend, making budgeting and financial planning more predictable.
Simplified cost management: A single, flexible commitment can now cover a more diverse set of services, streamlining your financial operations and reducing the complexity of managing multiple, granular commitments.
Fuel innovation: By reducing the cost of core compute and serverless services, you free up budget that can be reinvested into innovation.

An updated Billing model

Compute Flex CUDs’ expanded coverage is made possible by the new and improved spend-based CUDs model, which streamlines how discounts are applied and provides greater flexibility. Enabling this feature triggers some experience changes to the Billing user interface, Cloud Billing export to BigQuery schema, and Cloud Commerce Consumer Procurement API. This new billing model is simpler: we directly charge the discounted rate for CUD-eligible usage, reflecting the applicable discount, instead of using credits to offset usage and reflect savings. It’s also more flexible: we apply discounts to a wider range of products within spend-based CUDs. For more, this follow-up resource details the updates, including information on a sample export to preview your monthly bill in the new format, key CUD KPIs, new SKUs added to CUDs, and CUD product information. You can learn more about these changes in the documentation.

Availability and next steps

At Google Cloud, we’re committed to providing you with the most flexible and cost-effective solutions for your evolving cloud needs. This expansion of Compute Flex CUDs is a testament to that commitment, enabling you to build, deploy, and scale your applications with even greater financial efficiency. Starting today, you can opt-in and begin enjoying Compute Flex CUDs’ expanded scope and improved billing model.

Starting January 21, 2026, all customers will be automatically transitioned to the new spend-based model to take advantage of these expanded Flex CUDs. If you don’t opt in to multi-price CUDs, these changes will be automatically applied on January 21, 2026. New customers who create a Billing Account on or after July 15, 2025 will automatically be under the new billing model for Flex CUDs. Stay tuned for more updates as we continue to enhance our offerings to support your success on Google Cloud.

Google is a Leader in the 2025 IDC MarketScape: FinOps Cloud Costs Optimization

Tue, 05 Aug 2025 16:00:00 +0000

Our customers come first, and we’ve focused on building FinOps tools that help them understand their cloud spend, optimize for efficiency, and prevent cost surprises. We’re excited to be recognized for this work, and named a leader in the 2025 IDC MarketScape for FinOps cloud cost optimization.

"This study evaluated the five global hyperscalers and their FinOps cloud cost optimization capabilities, assessing several dimensions within strategy and product capabilities. A strength of Google Cloud FinOps is its integration with Gemini, helping customers mature, automate, and accelerate cost optimization. This is in addition to the thought leadership Google has been demonstrating with its product strategy and driving industry support for open standards." - Jevin Jensen, IDC Research Vice President and triple-certified FinOps practitioner and engineer

Here are the top 10 of our top FinOps innovations that are helping Google Cloud customers:

We stream net-cost data in real time, so your information stays current with actual cloud costs. 99% of that data arrives within 24 hours, and many services update several times a day.
We provide granular, sub-resource cost data out-of-the-box – without additional hoops to jump through, like agent installs – which means you can understand your cost drivers faster. For example, for more than two years, we have broken up Kubernetes costs into clusters, namespaces, and pods.
The FinOps Hub centralizes all cost optimization activities in one place, highlighting inefficiencies so business professionals can collaborate with development teams to drive meaningful change.

FinOps Hub in action

4. We integrated generative AI into FinOps workflows early, creating specialized business use cases for all users. This saves time when finding cost insights and optimization opportunities, with grounded answers to ensure accuracy and relevance.

Gemini Cloud Assist for FinOps in action.

5. We focus on the FinOps user, and the rest follows. Over the years, we have built up an amazing group of FinOps practitioners we work closely with to evolve our FinOps products. We also have a FinOps executive advisory board that allows us to look forward and understand where the industry is evolving.

6. We believe we can make billing enjoyable. The microinteractions, zero states, guided tours, and elegant material design, all work together to create experiences that feel intuitive and Googley.

7. We provide customers a FinOps score to help you make data-informed decisions when building business cases for committed use discounts or identifying spend that needs better organization through tagging or budget coverage. Using this score you can see how you benchmark against peers.

Google Cloud customers get their own FinOps score and can see how they compare with their peers.

8. We have fast cost-anomaly detection that runs hourly, with high precision. And we also offer root cause analysis information for our users to take action quickly.

9. We provide real-time scenario modelling for rate optimizations, managing terabytes of data in our UI quickly and easily. Customer controls let you shape and model the data as needed.

FinOps Hub scenario modelling in action.

10. We provide these FinOps tools at no additional charge to Google Cloud customers. We don’t charge extra for extended data lookback windows, UI views and analysis, or FinOps Hub cost optimizations. This helps customers spend less time on understanding their bills and more time driving business innovation.

Read the full IDC MarketScape excerpt to learn more about our capabilities.

^{Source: “IDC MarketScape: Worldwide FinOps Cloud Costs Optimization Hyperscalers 2025 Vendor Assessment” by Jevin Jensen, July 2025, IDC #US53679825}

^{IDC MarketScape vendor analysis model is designed to provide an overview of the competitive fitness of ICT suppliers in a given market.  The research methodology utilizes a rigorous scoring methodology based on both qualitative and quantitative criteria that results in a single graphical illustration of each vendor’s position within a given market. The Capabilities score measures vendor product, go-to-market and business execution in the short-term. The Strategy score measures alignment of vendor strategies with customer requirements in a 3-5-year timeframe. Vendor market share is represented by the size of the circles. Vendor year-over-year growth rate relative to the given market is indicated by a plus, neutral or minus next to the vendor name.}

Optimize your cloud costs using Cloud Hub Optimization and Cost Explorer

Mon, 04 Aug 2025 16:00:00 +0000

Application owners are looking for three things when they think about optimizing cloud costs:

What are the most expensive resources?
Which resources are costing me more this week or month?
Which resources are poorly utilized?

To help you answer these questions quickly and easily, we announced Cloud Hub Optimization and Cost Explorer, in private preview, at Google Cloud Next 2025. And today, we are excited to announce that both Cloud Hub Optimization and Cost Explorer are now in public preview.

Application cost and utilization

As an app owner, your primary objective is keeping your application healthy at all times. Yet, monitoring all the individual components of your application, which may straddle dozens of Projects, can be quite overwhelming. AppHub Applications allow you to reorganize cloud around your application, giving you the information and controls you need at your fingertips.

In addition to supporting Google Cloud Projects, Cloud Hub Optimization and Cost Explorer leverage App Hub applications to show you the cost-efficiency of your application’s workloads and services instantly. This is great for instance when you are trying to pinpoint deployments running on GKE clusters that might be wasting valuable resources, such as GPUs.

Not just another cost dashboard

When you bring up Cloud Hub Optimization, you can immediately see the resources that are costing you the most, along with the percentage change in their cost. With this highly granular cost information, you can now attribute your costs to specific resources and resource owners to reason about any changes in costs.

We have additionally integrated granular cost data from Cloud Billing and resource utilization data from Cloud Monitoring to give you a comprehensive picture of your cost efficiency. This includes average vCPU utilization for your Project, which helps you find the most promising optimization candidates across hundreds of Google Cloud Projects.

The Cost Explorer dashboard also shows you your costs logically organized at the product level, for even more cost explainability. Instead of seeing a lump sum cost for Compute Engine, you can now see your exact spend on individual products including Google Kubernetes Engine (GKE) clusters, Persistent Disks, Cloud Load Balancing, and more.

Simple is powerful

Customers who have tried these new tools love the information that is surfaced as well as the simplicity of the interfaces.

“My team has to keep an eye on cloud costs across tens of business units and hundreds of developers. The Cloud Hub Optimization and Cost Explorer dashboards are a force multiplier for my team as they tell us where to look for cost savings and potential optimization opportunities.” - Frank Dice, Principal Cloud Architect, Major League Baseball

Customers especially appreciate the breadth of product coverage available out of the box without any additional setup, and the fact that there is no additional charge to using these features.

What’s next

As your organization “shifts left” on cloud cost management, we are working to help application owners and developers understand and optimize their cloud costs. You can try Cloud Hub Optimize and Cost Explorer here.

You can also see a live demo of how Cloud Hub Optimization and Cost Explorer can be used to identify underutilized GKE clusters within seconds in the Google Cloud Next 2025 talk Maximize Your Cloud ROI.

^{Major League Baseball trademarks and copyrights are used with permission of Major League Baseball. Visit MLB.com.}

Spring cleaning with FinOps Hub 2.0

Wed, 16 Apr 2025 16:00:00 +0000

Spring is a great reminder to spring clean – an annual tradition that should extend not only to your household, but also to your virtual cloud infrastructure. Why not start with Google Cloud’s FinOps Hub?

As Google Cloud customers have adopted the FinOps hub to guide their optimization initiatives, we started getting additional feedback from our business community. For example, while DevOps users have access to tools and utilization metrics to identify waste, business teams often lack clear insights into resource consumption, leading to a significant blind spot. The most recent State of FinOps 2025 Report reinforces this need, underscoring the importance of workload optimization and waste reduction as the #1 Top FinOps concern. It’s extremely difficult to optimize workloads or applications if customers cannot fully understand how much is even being used. Why purchase a committed use discount for compute cores that you might not even be fully using?

Sometimes the easiest optimizations our customers can make are really just using more efficiently the resources they are actually paying for. That’s why, in 2025, we are focused on the deep clean of your optimization opportunities and have upgraded FinOps Hub to help you find, highlight, and eliminate wasted spend.

aside_block: <ListValue: [StructValue([('title', 'Try Google Cloud for free'), ('body', <wagtail.rich_text.RichText object at 0x7f7dc9cfd5e0>), ('btn_text', 'Get started for free'), ('href', 'https://console.cloud.google.com/freetrial?redirectPath=/welcome'), ('image', None)])]>

1. Find waste: FinOps Hub 2.0 now comes with new utilization insights to zero in on optimization opportunities.

At Google Cloud Next 2025, we introduced FinOps Hub 2.0, focused exclusively on bringing utilization insights on your resources to the forefront so you can see what potential waste may exist and take action immediately. Waste can come in many forms: from a VM that is barely getting used at 5% (overprovisioned), to a GKE cluster that is actually running hot at 110% utilization and might fail (underprovisioned), to managed resources like Cloud Run instances that may not be optimally configured (suboptimal configuration) or, worse yet, a VM that might not ever have been used (idle). FinOps users can now quickly view the most expensive waste category in one, easy-to-understand heatmap by service or AppHub application. But FinOps Hub doesn’t just show you where there may be waste; it also includes more cost optimizations for Kubernetes Engine (GKE), Compute Engine (GCE), Cloud Run, and Cloud SQL to remedy the waste too.

Waste map showing identified resources with their corresponding utilization metrics

2. Highlight waste: Gemini Cloud Assist supercharges FinOps Hub to summarize optimization insights and send opportunities to engineering.

But perhaps what really makes this a 2.0 release is that we supercharged the most time-consuming tasks on FinOps Hub with Gemini Cloud Assist. Our first launch of Gemini Cloud Assist, which helps create personalized cost reports and synthesize insights, has resulted in >100k FinOps hours saved by our customers annually (from January 2024 to January 2025). The power of Gemini Cloud Assist to supercharge and automate workflows is a huge benefit, so we applied that to FinOps Hub in two ways. First, FinOps can now see embedded optimization insights on the hub itself –similar to cost reports – so you don’t need to solve the “needle in the haystack” problem of optimization. Second, you can now use Gemini Cloud Assist to summarize and send top waste insights to your engineering teams to take action and remediate fast.

Gemini summary and draft emails with top optimization opportunities

3. Eliminate waste: introducing a NEW IAM role permission for your tech solution owners to see & directly take action on these optimization opportunities.

Finally, perhaps our most exciting feature – and long overdue for FinOps – is that we are unlocking access to the Billing console for tech solution owners, so that these owners can get FinOps insights and Gemini Cloud Assist insights across all their projects, in a single pane. For example, if you want to give access to FinOps Hub or cost reports to an entire department that only uses a subset of projects for their infrastructure – without providing them with broader billing data access, but still allowing them to see all of their data in a single view – now you can, with multi-project views in the billing console. Multi-project views are enabled using the new Project Billing Costs Manager IAM role (or related granular permissions). These new permissions are currently in private preview so sign-up to get access. Now you can truly extend the power of FinOps tools across your organization with these new access controls.

So take this Spring to try FinOps Hub 2.0 with Gemini Cloud Assist, and do some spring cleaning on your cloud infrastructure, because as the saying goes, “With clouds overgrown, like winter’s old grime, Spring clean your servers, save dollars and time.” – well at least that’s what they say according to Gemini.

How to calculate your AI costs on Google Cloud

Mon, 03 Mar 2025 17:00:00 +0000

What is the true cost of enterprise AI?

As a technology leader and a steward of company resources, understanding these costs isn't just prudent – it's essential for sustainable AI adoption. To help, we’ll unveil a comprehensive approach to understanding and managing your AI costs on Google Cloud, ensuring your organization captures maximum value from its AI investments.

Whether you're just beginning your AI journey or scaling existing solutions, this approach will equip you with the insights needed to make informed decisions about your AI strategy.

Why understanding AI costs matters now

Google Cloud offers a vast and ever-expanding array of AI services, each with its own pricing structure. Without a clear understanding of these costs, you risk budget overruns, stalled projects, and ultimately, a failure to realize the full potential of your AI investments. This isn't just about saving money; it's about responsible AI development – building solutions that are both innovative and financially sustainable.

Breaking down the Total Cost of Ownership (TCO) for AI on Google Cloud

Let's dissect the major cost components of running AI workloads on Google Cloud:

Cost category	Description	Google Cloud services (Examples)
Model serving cost	The cost of running your trained AI model to make predictions (inference). This is often a per-request or per-unit-of-time cost.	OOTB models available in Vertex AI, Vertex AI Prediction, GKE (if self-managing), Cloud Run Functions (for serverless inference)
Training and tuning costs	The expense of training your AI model on your data and fine-tuning it for optimal performance. This includes compute resources (GPUs/TPUs) and potentially the cost of the training data itself.	Vertex AI Training, Compute Engine (with GPUs/TPUs), GKE or Cloud Run (with GPUs/TPUs)
Cloud hosting costs	The fundamental infrastructure costs for running your AI application, including compute, networking, and storage.	Compute Engine, GKE or Cloud Run, Cloud Storage, Cloud SQL (if your application uses a database)
Training data storage and adapter layers costs	The cost of storing your training data and any "adapter layers" (intermediate representations or fine-tuned model components) created during the training process.	Cloud Storage, BigQuery
Application layer and setup costs	The expenses associated with any additional cloud services needed to support your AI application, such as API gateways, load balancers, monitoring tools, etc.	Cloud Load Balancing, Cloud Monitoring, Cloud Logging, API Gateway, Cloud Functions (for supporting logic)
Operational support cost	The ongoing costs of maintaining and supporting your AI model, including monitoring performance, troubleshooting issues, and potentially retraining the model over time.	Google Cloud Support, internal staff time, potential third-party monitoring tools

aside_block: <ListValue: [StructValue([('title', 'Try Google Cloud for free'), ('body', <wagtail.rich_text.RichText object at 0x7f7dc9c7ba60>), ('btn_text', 'Get started for free'), ('href', 'https://console.cloud.google.com/freetrial?redirectPath=/welcome'), ('image', None)])]>

Let’s estimate costs with an example

Let's illustrate this with a hypothetical, yet realistic, generative AI use case: Imagine you’re a retail customer with an automated customer support chatbot.

Scenario: A medium-sized e-commerce company wants to deploy a chatbot on their website to handle common customer inquiries (order status, returns, product information and more). They plan to use a pre-trained language model (like one available through Vertex AI Model Garden) and fine-tune it on their own customer support data.

Assumptions:

Model: Fine-tuning a low latency language model (in this case we will use Gemini 1.5 Flash).
Training data: 1 million customer support conversations (text data).
Traffic: 100K chatbot interactions per day.
Hosting: Vertex AI Prediction for serving the model.
Fine-tuning frequency: Monthly.

Cost estimation

As the retail customer in this example, here’s how you might approach this.

1. First, discover your model serving cost:

Vertex AI Prediction (Gemini 1.5 Flash for Chat) pricing is modality-based pricing so in this case since our input and output is text, the usage unit will be characters. Let's assume an average of 1000 input characters and 500 output characters per interaction.
Cost per 1M characters input: $0.0375.
Cost per 1M characters output: $0.15
Input cost per day: 100,000 interactions * 1000 characters * $0.0375 / 1000000 = $3.75
Output cost per day: 100,000 interactions * 500 characters * $0.15 / 1000000 characters = $7.5
Total model serving cost per day: $11.25
Total model serving cost per month (~30 days): ~$337

Servicing cost of Gemini Flash 1.5 LLM model

2. Second, identify your training and tuning costs:

In this scenario, we aim to enhance the model's accuracy and relevance to our specific use case through fine-tuning. This involves inputting a million past chat interactions, enabling the model to deliver more precise and customized interactions.

Cost per training tokens: $8 / M tokens
Cost per training characters: $2 / M characters (where each token approximately equates to 4 characters)
Tuning cost (first month): 1,000,000 conversation (training data) * 1500 characters (input + output) * 2 /1,000,000 = $3,000
Tuning cost (subsequent month): 100,000 conversation (new training data) * 1500 characters (input + output) * 2 /1,000,000 = $300

3. Third, understand the cloud hosting costs:

Since we're using Vertex AI Prediction, the underlying infrastructure is managed by Google Cloud. The cost is included in the per-request pricing. However, if we are self-managing the model on GKE or Compute Engine, we'd need to factor in VM costs, GPU/TPU costs (if applicable), and networking costs. For this example, we assume this is $0, as it is part of Vertex AI cost.

4. Fourth, define the training data storage and adapter layers costs:

The infrastructure costs for deploying machine learning models often raise concerns, but the data storage components can be economical at moderate scales. When implementing a conversational AI system, storing both the training data and the specialized model adapters represents a minor fraction of the overall costs. Let's break down these storage requirements and their associated expenses.

1M conversations, assuming an average size of 5KB per conversation, would be roughly 5GB of data.
Cloud Storage cost for 5GB is negligible: $0.1 per month.
Adapter layers (fine-tuned model weights) might add another 1GB of storage. This would still be very inexpensive: $0.02 per month.
Total storage cost per month: < $1/month

5. Fifth, consider the application layer and setup costs:

This depends heavily on the specific application. In this case we are using Cloud Run Functions and Logging. Cloud Run to handle pre- and post-processing of chatbot requests (e.g., formatting, database lookups). In this case let's assume we use request-based billing so we are only charged when it processes the request. In this example we are processing 3M requests per month (100K * 30) and assuming 1 sec for average execution time: $14.30

Cloud Run function cost for request-based billing

Cloud Logging and Monitoring for tracking chatbot performance and debugging issues. Let's estimate 100GB of logging volume (which is on higher end) and retaining the logs for 3 months: $28

Cloud Logging costs for storage and retention

Total application layer cost per month:~ $40

6. Finally, incorporate the Operational support cost:

This is the hardest to estimate, as it depends on the internal team's size and responsibilities. Let's assume a conservative estimate of 5 hours per week of an engineer's time dedicated to monitoring and maintaining the chatbot, at an hourly rate of $100.

Total operational support cost per month: 5 hours/week * 4 weeks/month * $100/hour = $2000
Total estimated monthly cost (First month):
$ 340 (Serving) + $3000 (Training) + $1 (Storage) + $40 (Application) + $2000 (Operational) = $5,381
Total estimated monthly cost (Subsequent months):
$340 (Serving) + $300 (Training) + $1 (Storage) + $40 (Application) + $2000 (Operational) = $2,681

You can find the full estimate of cost here. Note that this does not include tuning and operational cost as it is not available in pricing export yet.

Once you have a good understanding of your AI costs, it is important to develop an optimization strategy that encompasses infrastructure choices, resource utilization, and monitoring practices to maintain performance while controlling expenses. By understanding the various cost components and leveraging Google Cloud's tools and resources, you can confidently embark on your AI journey. Cost management isn't a barrier; it's an enabler. It allows you to experiment, innovate, and build transformative AI solutions in a financially responsible way.

Get started

Start understanding your AI costs today: Explore the Google Cloud Pricing Calculator and the Vertex AI Pricing Page.
Learn more at Google Cloud Next: Register for the Google Next session on AI Investment to Impact: Unlocking Sustainable ROI with Google Cloud.
Engage Google Cloud for expert guidance: Get expert help to design cost effect AI architectures, contact Google Cloud Consulting or PSO.

Accelerate your cloud journey using a well-architected, principles-based framework

Fri, 14 Feb 2025 17:00:00 +0000

In today's dynamic digital landscape, building and operating secure, reliable, cost-efficient and high-performing cloud solutions is no easy feat. Enterprises grapple with the complexities of cloud adoption, and often struggle to bridge the gap between business needs, technical implementation, and operational readiness. This is where the Google Cloud Well-Architected Framework comes in. The framework provides comprehensive guidance to help you design, develop, deploy, and operate efficient, secure, resilient, high-performing, and cost-effective Google Cloud topologies that support your security and compliance requirements.

Who should use the Well-Architected Framework?

The Well-Architected Framework caters to a broad spectrum of cloud professionals. Cloud architects, developers, IT administrators, decision makers and other practitioners can benefit from years of subject-matter expertise and knowledge both from within Google and from the industry. The framework distills this vast expertise and presents it as an easy-to-consume set of recommendations.

The recommendations in the Well-Architected Framework are organized under five, business-focused pillars.

We recently completed a revamp of the guidance in all the pillars and perspectives of the Well-Architected Framework to center the recommendations around a core set of design principles.

Operational excellence	Security, privacy, and compliance	Reliability	Cost optimization	Performance optimization
Operational readiness Incident management Resource optimization Change management Continuous improvement	Security by design Zero trust Shift-left security Preemptive cyber-defense Secure and responsible AI AI for security Regulatory, privacy, and compliance needs	User-focused goals Realistic targets HA through redundancy Horizontal scaling Observability Graceful degradation Recovery testing Thorough postmortems	Spending aligned with business value Culture of cost awareness Resource optimization Continuous optimization	Resource allocation planning Elasticity Modular design Continuous improvement

In addition to the above pillars, the Well-Architected Framework provides cross-pillar perspectives that present recommendations for selected domains, industries, and technologies like AI and machine learning (ML).

aside_block: <ListValue: [StructValue([('title', 'Try Google Cloud for free'), ('body', <wagtail.rich_text.RichText object at 0x7f7dbfb19cd0>), ('btn_text', 'Get started for free'), ('href', 'https://console.cloud.google.com/freetrial?redirectPath=/welcome'), ('image', None)])]>

Benefits of adopting the Well-Architected Framework

The Well-Architected Framework is much more than a collection of design and operational recommendations. The framework empowers you with a structured principles-oriented design methodology that unlocks many advantages:

Enhanced security, privacy, and compliance: Security is paramount in the cloud. The Well-Architected Framework incorporates industry-leading security practices, helping ensure that your cloud architecture meets your security, privacy, and compliance requirements.
Optimized cost: The Well-Architected Framework lets you build and operate cost-efficient cloud solutions by promoting a cost-aware culture, focusing on resource optimization, and leveraging built-in cost-saving features in Google Cloud.
Resilience, scalability, and flexibility: As your business needs evolve, the Well-Architected Framework helps you design cloud deployments that can scale to accommodate changing demands, remain highly available, and be resilient to disasters and failures.
Operational excellence: The Well-Architected Framework promotes operationally sound architectures that are easy to operate, monitor, and maintain.
Predictable and workload-specific performance: The Well-Architected Framework offers guidance to help you build, deploy, and operate workloads that provide predictable performance based on your workloads’ needs.
The Well-Architected Framework also includes cross-pillar perspectives for selected domains, industries, and technologies like AI and machine learning (ML).

The principles and recommendations in the Google Cloud Well-Architected Framework are aligned with Google and industry best practices like Google’s Site Reliability Engineering (SRE) practices, DORA capabilities, the Google HEART framework for user-centered metrics, the FinOps framework, Supply-chain Levels for Software Artifacts (SLSA), and Google's Secure AI Framework (SAIF).

Embrace the Well-Architected Framework to transform your Google Cloud journey, and get comprehensive guidance on security, reliability, cost, performance, and operations — as well as targeted recommendations for specific industries and domains like AI and ML. To learn more, visit Google Cloud Well-Architected Framework.

To avoid “bill shocks,” Palo Alto Networks deploys custom AI-powered cost anomaly detection

Mon, 09 Dec 2024 17:00:00 +0000

In today's fast-paced digital world, businesses are constantly seeking innovative ways to leverage cutting-edge technologies to gain a competitive edge. AI has emerged as a transformative force, empowering organizations to automate complex processes, gain valuable insights from data, and deliver exceptional customer experiences.

However, with the rapid adoption of AI comes a significant challenge: managing the associated cloud costs. As AI — and really cloud workloads in general — grow and become increasingly sophisticated, so do their associated costs and potential for overruns if organizations don’t plan their spend carefully.

These unexpected charges can arise from a variety of factors:

Human error and mismanagement: Misconfigurations in cloud services (e.g., accidentally enabling a higher-tiered service or changing scaling settings) can inadvertently drive up costs.
Unexpected workload changes: Spikes in traffic or usage, or changes in application behavior (e.g., marketing campaign or sudden change in user activity) can lead to unforeseen service charges.
Lack of proactive governance and cost transparency: Without a robust cloud FinOps framework, it's easy for cloud spending to spiral out of control, leading to significant financial overruns.

Organizations have an opportunity to proactively manage their cloud costs and avoid budget surprises. By implementing real-time cost monitoring and analysis, they can identify and address potential anomalies before they result in unexpected expenses. This approach empowers businesses to maintain financial control and support their growth objectives.

aside_block: <ListValue: [StructValue([('title', 'Try Google Cloud for free'), ('body', <wagtail.rich_text.RichText object at 0x7f7dc8fd6ca0>), ('btn_text', 'Get started for free'), ('href', 'https://console.cloud.google.com/freetrial?redirectPath=/welcome'), ('image', None)])]>

As one of the world’s leading cybersecurity organizations — serving more than 70,000 organizations in 150 countries — Palo Alto Networks must bring a level of vigilance and awareness to its digital business. Since it experiments often with new technologies and tools and deals with spikes in activity when threat actors mount an attack, the chances for anomalous spending run higher than most.

Recognizing the need of all its customers to effectively manage its cloud spend, Google Cloud launched the Cost Anomaly Detection as part of the Cost Management toolkit. It does not require any setup and automatically detects anomalies for your Google Cloud projects and empowers teams with details to alert and provide root-cause analysis. While Palo Alto Networks used this feature for a while and found it useful, it eventually realized the need for a customized solution. Due to stringent custom requirements, it wanted a service that could identify anomalies based on labels, such as applications or products that span across Google Cloud projects, and provide more control over anomaly variables that are detected and alerted to its teams. Creating a consistent experience across its multicloud environments was also a priority.

Palo Alto Networks’ purpose-built solution tackles cloud management and AI costs head-on, helping the organization to be proactive at scale. It is designed to enhance cost transparency by providing real-time alerts to product owners, so they can make informed decisions and act quickly. The solution also delivers automated insights at scale, freeing up valuable time for the team to focus on innovation.

By removing the worry of unexpected costs, Palo Alto Networks can now confidently embrace new cloud and AI workloads, accelerating its digital transformation journey.

Lifecycle of an anomaly

For Palo Alto Networks, anomalies are unexpected events or patterns that deviate from the norm. In a cloud environment, anomalies can indicate anything from a simple misconfiguration to a full-blown security breach. That's why it's critical to have a system in place to detect, analyze, and mitigate anomalies before they can cause significant damage.

This flowchart illustrates the typical lifecycle of an anomaly, broken down into three key stages:

Figure 1 - Lifecycle of an Anomaly

The following sections will take a deeper dive into how Palo Alto Networks used Google Cloud to build its custom AI-powered anomaly solution to address each of these stages.

1. Detection

The first step is to identify potential anomalies.Palo Alto Networks partnered with Google Cloud Consulting to train the ARIMA+ model with billing data from its applications using BigQuery ML (BQML). The team chose this model for its great results for time-series billing data, its ability to customize hyper parameters, and its overall effective cost of operation at scale.

The ARIMA+ model allowed Palo Alto Networks to generate a baseline spend with upper and lower bounds for its cost anomaly solution. The team also tuned the model using Palo Alto Networks’ historic billing data, enabling it to inherently understand factors like seasonality, common spikes and dips, migration patterns, and more. If the spend exceeds the upper bound created by the model, the team can then quantify the business cost impact (both percentage and dollar amount) to determine the severity of the alert to be investigated further.

Figure 2 - AI-Powered Cost Anomaly Solution Architecture on Google Cloud

Looker, Google Cloud’s business intelligence platform, serves as the foundation for custom data modeling and visualization, seamlessly integrating with Palo Alto Networks’ existing billing data infrastructure, which continuously streams into BigQuery multiple times a day. This eliminates the need for additional data pipelines, ensuring the team has the most up-to-date information for analysis.

BigQuery MLempowers Palo Alto Networks with robust capabilities for machine learning model training and inference. By leveraging BQML, the team can build and deploy sophisticated models directly within BigQuery, eliminating the complexities of managing separate machine learning environments. This streamlined approach accelerates the ability to detect and analyze cost anomalies in real time. In this case, Palo Alto Networks trained the ARIMA+ model on the last 13 months of billing data for specific applications on the Net Spend field to capture seasonality, spikes and dips, along with migration patterns and known spikes based on a custom calendar.

To enhance alerting and anomaly management processes, the team also utilizes Google Cloud Pub/Sub and Cloud Run functions. Pub/Sub facilitates the reliable and scalable delivery of anomaly notifications to relevant stakeholders. Cloud Run functions enable custom logic for processing these notifications, including intelligent grouping of similar anomalies to minimize alert fatigue and streamline investigations. This powerful combination allows Palo Alto Networks to respond swiftly and effectively to potential cost issues.

2. Notification and analysis

Once the anomaly is captured, the solution computes the business cost impact and routes alerts to the appropriate application teams through Slack for further investigation. To accelerate root-cause analysis, it synthesizes critical information through text and images to provide all the details about anomaly, pinpointing exactly when it occurred and which SKUs or resources are involved. Application teams can then further analyze this information and, with their application context, quickly arrive at a decision.

Here is an example of snapshot that captured an increased cost in BigQuery that started on July 30th:

Figure 3 - Example of Anomaly Detected with Resource details

The cost anomaly solution automatically gathered all the information related to the flagged anomalies, such as Google Cloud project ID, data, environment, service names andSKUs, along with the cost impact. This data provided much of the necessary context for the application team to act quickly. Here is an example of the Slack alert:

Figure 4 - Example of anomaly alert on Slack

3. Mitigation

Once the root cause is identified, it's time to take action to mitigate the anomaly. This may involve anything from making a simple configuration change to deploying a hotfix. In some cases, it may be necessary to escalate the issue and involve cross-functional teams.

In the provided example, a cloud hosted tenant encountered a substantial increase in data volume due to a configuration error. This misconfiguration led to unusually high BigQuery usage. As no default BigQuery reservation existed in the newly established region, the system defaulted to the on-demand pricing model, incurring higher costs.

To address this, the team procured 100 baseline slots with a 3-year commitment and implemented autoscaling to accommodate any future spikes without impacting performance. To prevent similar incidents, especially in new regions, a long-term cost governance policy was implemented at the organizational level.

Post incident, the cost anomaly solution generates a blameless post mortem document containing the highlights of the actions taken, the impact of collaboration, and the cost savings achieved through timely detection and mitigation. This document focuses on:

A detailed timeline of events: This list might include when a cost increase was captured, when the team was alerted, and the mitigation plan with short-term and long-term initiatives to prevent this in future.
Actions taken: This description includes details about anomaly detection, the analysis conducted by the application team, and mitigative actions taken.
Preventative strategy: This describes the short-term and long-term plan to avoid similar future incidents.
Cost impact and cost avoidance: These calculations include the overall cost incurred from the anomaly and estimate the additional cost if the issue had not been detected in a timely manner.

A formal communication is then sent out to the Palo Alto Networks application team, including leadership, for further visibility.

From its experience working at scale, Palo Alto Networks has learned to embrace the fact that anomalies are unavoidable in cloud environments. To manage them effectively, a well-defined lifecycle encompassing detection, analysis, and mitigation is crucial. Automated monitoring tools play a key role in identifying potential anomalies, while collaboration across teams is also essential for successful resolution. In particular, the team places huge emphasis on the importance of continuous improvement for optimizing the anomaly management process. For example, they established the reporting dashboard below for long-term continuous governance.

Figure 5 - Cost Anomaly Reporting Dashboard in Looker

By leveraging the power of AI and partnering with Google Cloud, Palo Alto Networks is enabling businesses to unlock the full potential of AI while ensuring responsible and sustainable cloud spending. With a proactive approach to cost anomaly management, organizations can confidently navigate the evolving landscape of AI, drive innovation, and achieve their strategic goals. Check out the public preview of Cost Anomaly Detection or reach out to Google Cloud Consulting for a customized solution.

_{We are extremely grateful to the entire team for partnering together to build this solution: Yaping Gu, Matt Orr, Andy Crutchfield, and Gina Huh.}

Gain control of your Google Cloud costs: Introducing the Cost Attribution Solution

Fri, 11 Oct 2024 16:00:00 +0000

As your Google Cloud usage expands, managing and understanding your cloud costs can become increasingly complex. As you drive adoption of cloud FinOps in your organization, identifying exactly which teams, projects, or services are driving your expenses is essential.

That's why we're excited to introduce the Google Cloud Cost Attribution Solution. This comprehensive suite of tools and best practices is designed to improve your cost metadata and labeling governance processes, enabling data-driven decisions so you can ultimately optimize your cloud spending. Whether you are just getting started or have been using Google Cloud for a while, the solution has tools and resources to help you.

Harness the power of labels

The Cost Attribution Solution leverages a fundamental Google Cloud feature that often goes underutilized: labels. These simple yet incredibly powerful key-value pairs act as metadata tags that you can attach to your Google Cloud resources. Think of them as customizable identifiers for your virtual machines, storage buckets, databases, and more. By strategically applying labels, you can unlock a wealth of cost insights:

Granular cost breakdowns: See exactly how much you're spending on specific services, applications, environments (like development, testing, and production), or even individual teams within your organization.
Data-driven decisions: Make informed choices about where to allocate resources, how to optimize costs, and what future investments are justified.
Customizable reporting: Generate reports tailored to your organization's specific needs. Need a breakdown of costs by department? Or by project phase? Labels make it possible.

Imagine being able to instantly answer questions like:

What's the cost difference between our development and production environments?
How much is the marketing team spending on cloud resources compared to the engineering team?
Are there specific services or applications that are disproportionately driving our monthly bill?
What's the true infrastructure cost of running our critical shopping cart service?

With the Cost Attribution Solution, these insights are no longer out of reach.

Proactive and reactive strategies for label governance

We understand that every organization's Google Cloud environment is unique, with different levels of maturity in cloud adoption and resource management. That's why the Cost Attribution Solution offers both proactive and reactive governance approaches for labels:

Proactive governance (enforcement): Start on the right foot by enforcing consistent and accurate labeling from the moment you provision new resources. Terraform Policy Validation integrates into your infrastructure-as-code workflows, helping ensure that every new resource is tagged correctly according to your organization’s labeling policies. This prevents cost tracking gaps and improves data accuracy from day one.

Reactive governance (reporting, alerting and reconciliation): For existing resources, we offer a dual approach:
- Reporting: Our tools help you identify unlabeled resources, providing a clear picture of where you may have gaps in cost visibility down to individual projects and resources.

- Alerting: Receive near real-time alerts when resources are created or modified without the proper labels, enabling you to quickly rectify any issues and maintain control over your cloud costs.

Reconciliation: Go beyond just reporting by actively enforcing your labeling policies on existing projects. This empowers you to automate the application of correct labels to unlabeled or mislabeled resources, for comprehensive cost visibility and data accuracy across your entire Google Cloud landscape.

Getting started

Ready to embark on your journey towards cost transparency? Our GitHub repository and the documentation on best practices for labels is your starting point. You'll find a wealth of resources, including:

Best practices: A guide to designing and implementing an effective labeling strategy tailored to your organization's structure and goals.
Solution architectures: Detailed diagrams and explanations of how to deploy the Cost Attribution Solution components in your Google Cloud environment.
Code samples and tutorials: Hands-on examples to help you get started quickly.

Here is a Looker Studio dashboard for interactive cost visualization and additional tools to streamline your cost management processes.

Furthermore, our Google Cloud Consulting FinOps experts can assess your needs and chart a course to fully integrate the cost attribution solution across your organization running on Google Cloud today.

Embrace cost transparency

Gain granular visibility into your cloud spending with the Google Cloud Cost Attribution Solution. Leverage labels to achieve granular cost breakdowns, optimize resource usage, and make data-driven decisions that align with your business goals. The solution will soon incorporate support for tags, offering a powerful way to organize resources across projects and implement fine-grained access control through IAM conditions. This additional layer of resource management empowers you to not only understand your costs but also streamline operations and enhance security.

Unlock the full potential of your cloud infrastructure and drive greater efficiency and ROI with the Cost Attribution Solution.

Reduce unexpected costs with the new AI-powered Cost Anomaly Detection

Mon, 07 Oct 2024 16:00:00 +0000

Controlling runaway spend and minimizing unexpected costs is a priority for every business. Imagine a scenario where faulty development or rogue code results in a usage spike over the weekend, unbeknownst to you. If not caught in time, this kind of usage can result in cost spikes that can exhaust your budgets and put a strain on finances.

At Google Cloud, we provide customers with a comprehensive set of cost management tools and controls to help prevent surprises. Now, we’re expanding our FinOps capabilities with AI technology that further simplifies cost management and helps ensure spend predictability. At Google Cloud Next ’24, we announced Cost Anomaly Detection and today, it's available to all customers in public preview. Cost Anomaly Detection helps identify anomalies in real or near-real-time and enables timely alerts so that you can avoid surprises, take swift action and control runaway costs.

Getting to know Cost Anomaly Detection

Google Cloud’s Cost Anomaly Detection can help you identify unusual spikes in cloud spending, across all products and services, by automatically monitoring your cloud projects and displaying any spikes in your billing console. This product does not require any setup and is available at no cost for all customers. Important components include:

1. Detection

Using AI, Cost Anomaly Detection identifies your spend patterns based on historical and seasonal trends and forecasts an expected rate of daily spend specific to your project. It continuously monitors your actual spend every hour and detects any deviation. These deviations are then identified as spikes or anomalies — a.k.a. ‘cost impact’ within the Cost Anomaly Detection dashboard. Since Cost Anomaly Detection monitors your spend on an hourly basis, it can identify any unexpected upward spikes within 24 hours, for most services, detecting anomalies in near real-time.

List of anomalies ordered by date

2. Investigation

Once an anomaly is detected, you want to understand its root cause. For each anomaly it identifies, Cost Anomaly Detection provides a detailed, easy-to-understand root-cause analysis that lists the top contributors to the spend. This allows you to narrow your investigation on the exact project, service, region or SKU that needs corrective action, thereby enabling quicker remediation.

Root cause analysis panel

3. Alerts

Once you know of an anomaly and its root cause, the appropriate owners need to be alerted of the impact to their respective projects, so they can cap or turn off usage. Today, anomaly notifications are sent through email and Pub/Sub, allowing for a wide range of personas to be notified, from the FinOps team to engineering. Cost Anomaly Detection also lets you easily set up customizable alert preferences that notify a set of desired recipients of an anomaly as soon as it is detected, while Pub/Sub alerts help with integration with your internal workflow management tools.

Set customized alerts for anomalies

Cost Anomaly Detection also lets you tailor your alerting threshold, based on cost impact, so that only significant anomalies are displayed and alerted. We recommend monitoring anomalies for at least one month before defining a threshold that applies across all your projects.

Additionally, Cost Anomaly Detection is continuously learning about your spend patterns, helping to reduce the possibility of false positives and increase sensitivity to not only monthly and seasonal trends, but also inter-day and inter-week fluctuations. To that end, for every identified anomaly, you can provide feedback on whether it was truly unexpected or a false positive due to, for example, a planned migration. This feedback helps the Cost Anomaly Detection AI models adapt in real-time, to your usage and take planned usage into consideration when evaluating future spikes.

Enhanced cost observability

With Cost Anomaly Detection, you have another way of optimizing your spend: controlling unintended cost. This, when coupled with existing tools such as Budgets, allows for a more robust and flexible cost-control governance. The product requires no setup, detects same-day anomalies, and enables focused action through detailed root-cause analysis and near-real-time alerts. If you’re already using your own anomaly detection solution, we encourage you to try Cost Anomaly Detection for free, to compare and contrast the results and the customizable controls available.

Head over to the Google Cloud billing console to access this experience and start elevating your FinOps game! For more details on this product, read the documentation here.

BigQuery jobs explorer: Your central hub for monitoring and troubleshooting BigQuery jobs

Mon, 23 Sep 2024 16:00:00 +0000

Ever feel overwhelmed by the sheer number of SQL queries running in your organization? Identifying expensive queries, tracking who's running them, and spotting spikes in errors are all part of the daily work. Efficient monitoring and management of query activity are essential to maintain a healthy and performant system.

We are excited to announce BigQuery jobs explorer, your command center for all things query-related. Now generally available, BigQuery jobs explorer helps you gain deep visibility into your organization's query activity, streamline troubleshooting, and optimize resource utilization.

"BigQuery jobs explorer gives us a comprehensive single-pane view of SQL activity across our entire organization, which helps us pinpoint anomalies and address them proactively. Since its launch, jobs explorer has become an essential asset in boosting platform efficiency and maintaining optimal system performance at PayPal!" - Abhijit Vyas, Senior MTS Database Engineer, PayPal

Solve multiple challenges with a single tool

BigQuery jobs explorer is a versatile tool that empowers you to tackle a wide range of use cases, all from a single platform.

Monitor: Get a bird's-eye view of query activity

Jobs explorer provides a comprehensive, real-time view of all SQL activity across your organization. No more piecing together information from different sources — you get a single pane of glass to see what's happening, when, and where.

Real-time monitoring: Track job status, progress, and resource usage as they happen.
Key metrics at a glance: Sort and analyze traffic based on metrics like TotalSlotMS, bytes processed, and more.
Visualize query execution: Intuitive graphs make it easy to understand query performance patterns.

With this level of visibility, you can proactively identify potential issues, spot trends, and make informed decisions about resource allocation.

Troubleshoot: Quickly identify and resolve problems

When something goes wrong, jobs explorer helps you get to the root of the problem fast.

No more complex queries: Access critical job information without writing any INFORMATION_SCHEMA queries.
Powerful filtering and sorting: Quickly narrow down jobs by status, priority, owner, project, and more.
Take action: Kill runaway queries directly from jobs explorer to save costs and reclaim resources
Deep dive into query details: Click on any job to see its execution graph and other key execution details.

Jobs explorer simplifies the process of troubleshooting, allowing you to focus on keeping your BigQuery environment running smoothly.

Optimize: Improve performance and control costs

Jobs explorer isn't just about reacting to problems — it's about proactively optimizing your BigQuery usage.

Identify performance bottlenecks: Pinpoint queries that are consuming excessive resources or taking too long to complete.
Query performance insights: Find and address queries that have been tagged by BigQuery with actionable performance insights.
Control costs: Avoid overspending by identifying and addressing inefficient queries. Often a small fraction of your queries account for the majority of your optimization gains!

By giving you a deeper understanding of your query activity, Jobs Explorer helps you make the most of your BigQuery spend.

What's next?

BigQuery Jobs Explorer is just the beginning. We're committed to continuously improving and expanding its capabilities to meet your evolving needs. Stay tuned for future updates, and feel free to share your feedback at bq-query-inspector-feedback@google.com. To learn about the feature in more detail, please see the public documentation.

Flexible committed-use discounts are now even more flexible

Mon, 15 Jul 2024 18:00:00 +0000

Google Cloud offers many great ways to run your workloads: low-level VMs in Google Compute Engine, container orchestration with Google Kubernetes Engine (GKE) — including via fully-managed Autopilot mode — and Cloud Run. Until now, to optimize your spend, you needed to purchase several Committed-use Discounts (CUDs) to cover each of these different products. For example, you might have purchased a Compute Engine Flexible CUD for VM spend including workloads running on GKE’s standard mode, a Cloud Run CUD for Cloud Run always-on instances, and an Autopilot CUD for workloads running in GKE Autopilot.

Expanding Compute Flexible CUDs

Today we are excited to announce that the Compute Engine Flexible CUD, now known as the Compute Flexible CUD, has been expanded to cover Cloud Run on-demand resources, most GKE Autopilot Pods and the premiums for Autopilot Performance and Accelerator compute classes. The documentation and our SKU list has the precise details on what’s included.

With one CUD purchase, you can cover eligible spend on all three products: Compute Engine, GKE, and Cloud Run. You can save 46% for a three-year commitment, and 28% for one-year commitments. With this single unified CUD, you can now make a single commitment and spend it across all these products, maximizing its flexibility. Furthermore, these commitments are not region-specific, so you can use them on resources in any region across these products.

Retiring the Autopilot CUD

Since the new expanded Compute Flexible CUD has a higher discount than the GKE Autopilot CUD and greater overall flexibility, we’re retiring the GKE Autopilot CUD. You can still purchase the legacy GKE Autopilot CUD until October 15, after which it will no longer be available for purchase. Any existing CUDs will continue to apply through their term regardless of when you purchase them. That said, we recommend looking into the newly expanded Compute Flexible CUD for your needs now and in the future, for its greater flexibility and better discounts!

How to get started

If you're already using Flexible CUDs for Compute Engine, you'll automatically see the discounts applied to eligible Cloud Run and GKE Autopilot usage (if you have product-specific CUDs like the legacy GKE Autopilot CUD, those will apply first). If you're new to Compute Flexible CUD, it's easy to get started: estimate your hourly spend across eligible SKUs, and purchase a commitment that matches your expected sustained usage over the one- or three-year term, and start enjoying the savings! You can add additional CUDs as your usage grows.

We hope you find this new flexibility useful when it comes to platforming your workloads on Google Cloud!

Next steps

Learn about Compute Flexible CUDs
View Cloud Run pricing
View GKE pricing and CUD options
Purchase a Compute Flexible CUD in the console

Normalize billing data across clouds with new Looker template and BigQuery views

Fri, 21 Jun 2024 16:00:00 +0000

At Google Cloud, we strongly believe you should have resources to analyze Google Cloud costs alongside other cloud providers, so you can better manage and optimize cloud costs. You should not need to spend time mapping billing terminology across cloud providers. And we believe in doing that through open standards. We were a founding member of the FinOps Foundation, a founding Steering Committee member of the FOCUS™ project, and a core contributor for the v0.5 and v1.0 Preview and GA open billing specifications. Today, we’re excited to announce a new Looker template view that leverages the recent FOCUS v1.0 GA to help simplify cloud cost management across clouds.0

What is FOCUS?

The unifying specification for cloud billing data, FOCUS is a technical specification that normalizes cost and usage billing data across cloud vendors. FOCUS aims to deliver consistency and standardization across cloud billing data by unifying cloud and usage data into one common data schema. Before FOCUS, there was no industry-standard way to normalize key cloud cost and usage measures across multiple cloud service providers (CSPs), making it challenging to understand how billing costs, credits, usage, and metrics map from one cloud provider to another (see FinOps FAQs for more details).

The FOCUS initiative is developing an open standard for cloud billing data and is being adopted by all major cloud vendors. With the introduction of Version 1.0, there is a common taxonomy, terminology, and metrics for billing datasets produced by CSPs.

Introducing a new Looker template for FOCUS v1.0 GA

Our new Looker template allows you to visualize your open billing data in Looker, generating a table based on the results of the FOCUS query. The provided LookML code creates and manages these tables automatically, so you won't need to create them manually. This template offers a glimpse of what’s possible to visualize your cost trends across services, SKUS, zones, regions, and resource types, offering many benefits:

Out-of-the box template: No more waiting for custom dashboards. The templates give you immediate access to pre-built visualizations that reveal cost trends, breakdowns by services, charges, and regions.
Easy filtering: You don't need to be a data analyst to user this template. Looker has an intuitive interface that lets you filter to specific time periods or services, and drill down into details with just a few clicks.
Customizability: While the template is a great starting point, Looker's flexibility lets you tailor the views to your specific needs. If you need to add custom metrics, change the visualizations, or embed the dashboards into your existing workflows, you can do that easily.

View your costs by billed services, publisher, commitments and more

An updated BigQuery view for FOCUS v1.0 GA

We offer three ways to export cost and usage-related Cloud Billing data to BigQuery: Standard Billing Export, Detailed Billing Export (resource-level data and price fields to join with Price Export table), and Price Export. In January, we introduced a new BigQuery view, a virtual table that represents the results of a SQL query, that transforms data towards FOCUS v1.0 Preview format. Today, we’re announcing an update to that BigQuery view to adapt towards the FOCUS v1.0 GA. If you are already using the Preview and want to update your BigQuery view to the FOCUS GA, please see the existing guide, which is kept up-to-date to reflect any new changes.

BigQuery views are great because the queryable virtual table only contains data from the tables and fields specified in the base query that defines the view. BigQuery views are virtual tables, so they incur no additional charges for data storage if you are already using Billing Export to BigQuery. With this BigQuery view you can:

View and query Google Cloud billing data that is adapted towards the FOCUS v1.0 specification
Use the BigQuery view as a data source for a visualization tools like Looker Studio
Analyze your Google Cloud costs alongside data from other providers using the common FOCUS format

How it works

The FOCUS BigQuery view acts as a virtual table that sits on top of your existing Cloud Billing data. To use this feature, you will need Detailed Billing Export and Price Exports enabled. The FOCUS BigQuery view uses a base SQL query to map your Cloud Billing data into the FOCUS schema, presenting it in the specified format. This allows you to query and analyze your data as if it were native to FOCUS, making it easier to analyze costs across different cloud providers.
The Looker template is supported by Looker and Looker Core, not Looker Studio. To use the template out of the box, ensure you have Detailed Billing Export and Pricing Export Enabled. You will also need permissions to create new Looker Project & Connection.
Unlike BigQuery Views, this Looker template utilizes temporary tables. The provided LookML code will create and manage these tables automatically, so you won't need to create them manually.

We've made it easy to leverage the power of FOCUS in Looker and in BigQuery with a step-by-step guide. To view this Looker template and sample SQL query and follow the step-by-step guide, sign up here.

Compare costs by services, regions, availability zones, and commitments

Looking ahead: Leading in open billing standards

We look forward to continuing to shape the standards of open billing standards alongside our customers, FinOps practitioners in the industry, the FinOps Foundation, CSPs, SaaS providers, and more. Get a unified view of your cloud costs today with the FOCUS Looker template and BigQuery view. Sign up here to learn more and get started.

^{Special thanks to Paige Rutherford, Sidney Stefani, Jingjie Zheng, Jacky Liu, and Gina Huh who helped develop these features.}

Leveling up FinOps: 5 cost management innovations from FinOps X 2024

Fri, 21 Jun 2024 16:00:00 +0000

At Google Cloud, our FinOps product philosophy is that all cloud costs should be visible and allocated, spend should be efficient with no waste, and there are of course no surprise costs. And once again, Google Cloud is at the forefront of FinOps innovation, leading with some exciting new product announcements at FinOps X 2024. Because if there is one thing we love, it’s unlocking cloud value for everyone through innovation!

Here are five ways we’re revolutionizing FinOps this year at FinOps X:

1. Making open cloud billing data a reality

At Next’24 we announced a new BigQuery view that transforms Google Cloud cost data so that it aligns with the attributes and metrics defined in the latest FinOps Open Cost & Usage (FOCUS) specification. A BigQuery view is a virtual table that represents the results of a SQL query; if you already use Billing Export to BigQuery, it incurs no additional data storage charges. This week, we updated the BigQuery view to match the latest FOCUS v1.0 Specification GA release, and announced a FOCUS Looker view that works with this BigQuery View.

With the FOCUS Looker view, you can now:

Visualize and filter your Google Cloud billing data that is adapted towards the FOCUS specification
Visualize your costs, changes, services, regions, and availability zones on intuitive graphs
Limit your manual work; the provided LookML code creates and manages these tables automatically — no need to create them manually

Visualize your Google Cloud data, normalized according to FOCUS 1.0 standards, sorted by list cost

2. Speaking in the language of business, not technology

In partnership with Google Cloud’s AI research teams, we have evolved Gemini Cloud Assist to help augment your FinOps cost management capabilities, embedded within Reports. With Gemini Cloud Assist, our express goal is to put accuracy above everything — because when it comes to cloud costs, you can’t afford to be right only some of the time. Here are few ways Gemini can help you save time:

Create cost reports on the fly: Simply tell Gemini Cloud Assist what costs you want to learn about, for example “What are my Compute costs for Project Dora last month?” Or you can ask a business question you’re grappling with e.g., “What caused my costs to increase last quarter?” Gemini Cloud Assist helps to provide you with the right Cost Report, so you can be confident about its answer, and dive deeper to answer your questions.
Summarize key insights: You no longer need to download and manually analyze data to understand your costs. Gemini Cloud Assist provides key insights directly within your cost reports, offering instant access to the most significant cost drivers and trends without digging through the data.
Go deep into granular cost trends: Using Billing BigQuery Exports (BQE), you no longer have to write queries to replicate the data you see in your Cost Reports. Anytime you view a cost report of your Google Cloud usage, we can provide you with a BigQuery script to dive deeper into the granular costs, turning FinOps professionals into data scientists.

Gemini Cloud Assist for FinOps to augment your efforts in a manner that puts accuracy and privacy above all else. And for extra peace of mind, we’ve also made it easier for you to quickly audit our answers, for extra peace of mind:

3. Expanding the definition of cost to include carbon

FinOps hub now integrates carbon footprint reporting to optimize your cloud environments for both financial performance as well as sustainability. Carbon footprint reporting lets you measure, report, and reduce carbon emissions while achieving your business goals. Through location-based carbon emission data and Google's unattended project recommendations, FinOps Hub provides actionable insights to drive impactful decisions that benefit both the bottom line and the planet. Google’s unattended project recommendation uses historical usage to provide recommendations about idle resources that can save you both money and carbon emissions.

By using carbon reporting directly in FinOps hub, you can gain a better understanding of your cloud environment's environmental impact, for example:

Identify emission hotspots: easily pinpoint the regions, projects, and products that contribute to most of your carbon footprint. Use this valuable information to help you identify changes you can make to improve your sustainability posture.
Set, track, and achieve sustainability goals: the carbon footprint report can be used as the baseline for setting and tracking your sustainability goals.
Identify carbon efficient regions: To reduce your carbon footprint, you can use carbon reporting footprint to identify and deploy your resources on the most carbon-efficient regions. FinOps hub recommendations now include "Low CO2" indicators to identify the most efficient regions.

View your carbon footprint across regions, projects, or individual Google Cloud services.

4. Modeling what an efficient cloud looks like, in near real-time

We've heard your feedback loud and clear. You love our CUD recommendations, but you need more power to model "what-if" scenarios that reflect your unique business reality. That's why we're thrilled to introduce FinOps hub’s Scenario Modeling for CUDs.

Now, you can build scenarios that reflect your business reality and quickly identify the right level of commitments to match your commitment strategy. Then, unlock more savings by:

Understanding usage patterns: Dive deep into historical data with customizable lookback periods of 30, 60, 90 or 180 days (lookback of 180 days will be available in July 2024)
Eliminating data noise: Easily filter out anomalies and outliers that could skew your projections.
Seeing instant results: Adjust your model parameters and watch the recommended commitment amount, estimated monthly savings, and usage pattern graphs update in near real-time.
Collaborating with confidence: share your model with colleagues and decision-makers to foster alignment and drive informed decision making.

Speaking of FInOps hub, we’re also adding new idle reservation recommendations! Compute Engine reservations guarantee your business access to critical Google Cloud Platform compute resources even during periods of peak demand or unexpected events, helping to ensure uninterrupted operations and preventing costly downtime. However, some customers forget to remove these reservations once they don’t need them any longer. FinOps hub's new Idle Reservation recommendation lets you optimize cloud costs and eliminate waste by analyzing usage patterns. For example, it can identify reservations that haven't been utilized for a customizable period (default: 7 days), so you can delete them and reduce unnecessary spending.

See unused reservation recommendations and review details now within the FinOps Hub!

5. Sending actionable alerts, not noise

At Next, we announced the private preview of our Cost Anomaly Detection solution, which continuously monitors your Google Cloud projects to identify any unexpected cost overruns, at near real-time. Each unexpected cost spike is explained with a granular root-cause, indicating the top drivers down to the SKU. Now, you can easily configure alert preferences for your anomalies, within the billing console. Through an easy, one-time setup, you can configure email or pubsub alerts either for every individual anomaly or opt-in for a daily summary for your desired set of recipients. You can also set up a cost impact threshold to ensure that you only receive alerts for anomalies that you consider significant.Further, you can influence our smart, AI-driven anomaly detection algorithm by providing feedback with a single click. Lastly, you can download a CSV for anomalies dating back to three months.

Final thoughts

Through continual product innovation and evolution, we’re constantly striving to solve real-world FinOps problems for our Google Cloud customers. Please try out these new releases, and sign up for our FinOps User Group to be a part of our product development efforts. Let us know what you think.