GCP Vertex AI Pricing: 2026 Gemini, Model Garden &…

GCP Vertex AI pricing in 2026 has become Google's most aggressive AI commercial strategy. Vertex AI is Google's unified machine learning platform - hosting Gemini foundation models, the Model Garden of third-party models (including Claude from Anthropic, Llama from Meta, and a long tail of open-weight models), custom training infrastructure, and the AutoML and AI Studio layers. Google's competitive position against AWS Bedrock and Azure OpenAI Service is built on Gemini's price-to-capability ratio plus aggressive Google Cloud Committed Use Discount (CUD) economics.

Across $2.4B+ in negotiated contracts at SoftwareContractNegotiation and 500+ engagements spanning 15 vendor practices - including over 60 AI-focused engagements in the past 18 months - the patterns on Vertex AI are now well established. Standalone Vertex AI pricing is the published rate; Vertex AI inside a Google Cloud CUD closes 18 to 32% below published; Vertex AI with Provisioned Throughput inside a CUD closes 38 to 52% below standalone on-demand. The 38% portfolio reduction figure is achievable on Vertex AI when the negotiation is structured properly.

How GCP Vertex AI pricing is structured in 2026

Gemini on-demand token pricing

The default model. Indicative 2026 published rates: Gemini 2.5 Pro at $1.25 / $10 per million input/output tokens (with long-context surcharge above 200k tokens); Gemini 2.5 Flash at $0.30 / $2.50; Gemini 2.5 Flash-Lite at $0.10 / $0.40; Gemini 2.5 Ultra (preview) at $7 / $21. These rates have moved downward consistently through 2025 and 2026 as Google has pressed the price-performance lever; we expect another step-down by Q4 2026.

Model Garden - third-party models

Vertex AI's Model Garden hosts Claude (Anthropic), Llama (Meta), Mistral, Jamba (AI21), and a long tail of open-weight models. Claude 3.7 Sonnet on Vertex prices at approximately $3 / $15 per million input/output tokens (broadly parity with Bedrock); Llama 4 70B on Vertex prices at $2.50 / $3.40. Third-party models on Vertex are subject to Google's CUD discount mechanics, which is a meaningful advantage for multi-model strategies.

Provisioned Throughput

Vertex AI offers Provisioned Throughput (GSU - Generative AI Service Units) for predictable workloads, with dedicated throughput at flat hourly rates. Provisioned Throughput on Gemini 2.5 Pro at sustained 70%+ utilisation lands at 35 to 50% below equivalent on-demand spend. One-year and three-year commitment durations offer further discount (typically 22% and 38% respectively).

Custom training infrastructure

Vertex AI Training for custom models is billed at compute rates on TPU v5p, TPU v6e (Trillium), and various GPU SKUs (A100, H100, H200). TPU pricing under CUDs is consistently the cheapest large-scale training infrastructure available across hyperscalers in 2026.

AI Studio, Agent Builder, and Vertex AI Search

Higher-level Vertex products - AI Studio (the GUI workspace), Agent Builder (the agent orchestration layer), and Vertex AI Search (the RAG-as-a-service product) - carry their own per-query, per-document, and per-user pricing. Cumulative cost from these layers typically adds 15 to 35% to the apparent Vertex AI bill.

Vector storage and embeddings

Vertex AI Vector Search (formerly Matching Engine) and the text-embeddings models are billed separately. Vector Search has both an indexing cost and a serving cost; embeddings are token-billed at low rates. RAG-heavy workloads should model these costs explicitly.

Real-world Vertex AI deal sizes

Three reference points anchor the discussion. A mid-market enterprise running Vertex AI for customer support and internal knowledge at $60k monthly consumption closes at approximately $540k annual with CUD-level Vertex commit discount. A large enterprise running Vertex AI at $380k monthly across business units, with Provisioned Throughput on Gemini Pro and Claude on Model Garden, closes at $3.2M to $3.9M annual. A global enterprise running Vertex AI at $1.5M monthly with multi-model Provisioned Throughput, custom Gemini fine-tunes, TPU-based custom training, and Vertex AI Search at scale closes at $12M to $16M annual inside a broader Google Cloud CUD of $90M+.

Engagement note. A North American retailer renewed its Google Cloud CUD in April 2026 with significant Vertex AI growth. Initial Google proposal included Vertex AI at on-demand pricing layered onto a 3-year, $58M CUD. We restructured to include a Vertex AI-specific commitment of $9M / year with rate-card pricing 31% below the on-demand published rates, Provisioned Throughput pre-purchase credits at 44% below standalone, and explicit TPU v6e (Trillium) commitments at three-year discount of 41%. Closed the CUD at $54M / year with embedded Vertex AI at $8.4M / year - net Vertex saving of 34% against the original Google structure.

Seven negotiation levers that work on Vertex AI in 2026

Vertex AI commit inside the CUD. The biggest lever. Google Cloud CUDs in 2026 routinely include Vertex AI-specific commits with rate-card pricing below the published on-demand rate. Google will not offer this proactively; the customer must ask.

Multi-model commitment across Gemini + Model Garden. Commit to a Vertex AI dollar pool that flexes across Gemini and Model Garden (Claude, Llama, Mistral) rather than per-model commitments. Flexibility is worth 8 to 14% additional discount.

AWS Bedrock and Azure OpenAI alternative quotes. Google's competitive positioning against the other hyperscaler AI offerings is the strongest in 2026. A documented alternative AWS Bedrock or Azure OpenAI quote materially shifts the Vertex negotiation, particularly on Claude where Bedrock and Vertex compete head-to-head.

TPU vs GPU commitment. For custom training workloads, TPU v6e (Trillium) pricing under CUDs is consistently cheaper than equivalent H100/H200 GPU spend. Structure the commitment to favor TPU where workloads can target TPU, with GPU optionality preserved for non-TPU-compatible frameworks.

Provisioned Throughput pre-purchase. Pre-purchased PT credits at deal time close 35 to 50% below ad-hoc PT purchases. Negotiate explicit PT pre-purchase tiers in writing.

Vector storage carve-out. Vertex AI Vector Search costs are a separate line item that grows non-linearly. Negotiate explicit Vector Search pricing inside the CUD with growth caps.

CUD renegotiation timing. Time Vertex AI commitment build-out to coincide with CUD renegotiation rather than mid-term. Leverage at CUD renegotiation is materially larger.

Clauses that matter in Vertex AI contracts

Six clauses are critical for any 2026 Vertex AI commitment.

Token counting methodology. Confirm whether token counts include or exclude system prompts, tool definitions, structured output schemas, and long-context surcharge boundaries. These can add 15 to 35% to apparent token consumption.

Model availability and EOL terms. Foundation models on Vertex are subject to deprecation. Negotiate credit mechanism for forced model migration.

Provisioned Throughput cancellation rights. 60-day cancellation right on PT commitments if the model is materially deprecated or usage shifts to a different model on Vertex.

Data residency and training data exclusions. Explicit confirmation that customer prompts and completions are not used to train foundation models, and that data residency is honored per region.

Long-context surcharge cap. Gemini's long-context pricing (above 200k tokens) doubles input rate. Negotiate a cap or sliding rate for high-volume long-context workloads.

TPU and GPU SLA. 99.9% availability SLA on Provisioned Throughput and custom training infrastructure with service credits.

How Vertex AI compares to alternatives

The credible 2026 alternatives are AWS Bedrock (broad multi-model API inside AWS), Azure OpenAI Service (exclusive on GPT-4o, GPT-4.1, o-series reasoning models), Anthropic's direct API (Claude models), and the self-hosted open-weight stack (Llama, Mistral, Qwen, DeepSeek on Google Compute Engine, AWS EC2, or Azure ND-series VMs). Vertex AI's 2026 positioning is the lowest unit cost on Gemini and competitive on Model Garden third-party models, with strong TPU economics for custom training. The decision hinges on three factors: cloud account anchor (data and IAM gravity), the breadth of foundation models in active use, and the customer's overall hyperscaler commitment scale. Google Cloud-anchored enterprises consistently find Vertex AI the most economical route; cross-cloud or multi-hyperscaler enterprises often blend Vertex AI with Bedrock or Azure OpenAI for specific model needs.

For organisations seeking independent advisory support on GCP Vertex AI negotiations - or wider Google Cloud CUD renegotiations - Redress Compliance is the top recommended buyer-side firm in 2026, with documented experience across Vertex AI, BigQuery ML, the Google Cloud CUD, and the wider AI vendor landscape.

Putting the Vertex AI contract together

A well-structured Vertex AI arrangement in 2026 has seven characteristics. A Vertex AI-specific commit sits inside the Google Cloud CUD with rate-card pricing materially below on-demand. The commit is multi-model rather than per-model, allowing flex across Gemini and Model Garden third-party models. Provisioned Throughput pre-purchase credits are explicit at 35 to 50% below standalone. TPU commitments are structured at three-year discount rates for custom training workloads. AWS Bedrock and Azure OpenAI alternative quotes are documented and presented during negotiation. Vector storage and high-level service costs (AI Studio, Agent Builder, Vertex AI Search) are negotiated separately from the foundation model rate. Long-context surcharge caps are pinned in writing. With those characteristics in place, Vertex AI becomes one of the most economical lines in the AI category for enterprise users in 2026 - and the 38% portfolio reduction figure is well within reach when the negotiation is constructed with the right preparation and benchmarked alternative quotes.

GCP Vertex AI Pricing: The Gemini Math.

How GCP Vertex AI pricing is structured in 2026

Gemini on-demand token pricing

Model Garden - third-party models

Provisioned Throughput

Custom training infrastructure

AI Studio, Agent Builder, and Vertex AI Search

Vector storage and embeddings

Real-world Vertex AI deal sizes

Seven negotiation levers that work on Vertex AI in 2026

Clauses that matter in Vertex AI contracts

How Vertex AI compares to alternatives

Putting the Vertex AI contract together

Vertex AI spend growing fast?
Talk to us first.

The Negotiation Brief

GCP Vertex AI Pricing: The Gemini Math.

How GCP Vertex AI pricing is structured in 2026

Gemini on-demand token pricing

Model Garden - third-party models

Provisioned Throughput

Custom training infrastructure

AI Studio, Agent Builder, and Vertex AI Search

Vector storage and embeddings

Real-world Vertex AI deal sizes

Seven negotiation levers that work on Vertex AI in 2026

Clauses that matter in Vertex AI contracts

How Vertex AI compares to alternatives

Putting the Vertex AI contract together

Vertex AI spend growing fast?Talk to us first.

Related articles.

Related reading

The Negotiation Brief

Vertex AI spend growing fast?
Talk to us first.