Spot Instance Strategy: 2026 Cloud Cost Playbook

A spot instance strategy is one of the highest-leverage operational moves available to cloud-heavy organisations. AWS Spot Instances, Azure Spot Virtual Machines, and Google Cloud Spot VMs all offer 60-80% discounts against on-demand pricing in exchange for the right to reclaim capacity with short notice. For workloads that tolerate interruption, the discount is essentially free money. For workloads that do not tolerate interruption, spot is unusable. The work is identifying which is which and architecting accordingly.

Across cloud advisory engagements in our practice, spot strategy is one of the most consistently under-utilised levers in cloud cost management. Mature buyers typically run 30-60% of total compute on spot. Less mature buyers run 0-10%. The difference is a function of architecture, monitoring discipline, and the operational maturity to handle interruption gracefully. None of these are difficult to build; they just require deliberate investment that many organisations have not made.

What spot instances actually are

Spot instances are unused capacity that cloud providers offer at deep discount on the understanding that the provider can reclaim the capacity with short notice (typically 2 minutes on AWS, 30 seconds on Azure, 30 seconds on GCP). The capacity is the same underlying hardware as on-demand instances. The instances perform identically while running. The difference is the interruption risk.

The discount levels vary by region, by instance type, and by demand. Typical discounts are 60-80% against on-demand pricing. For some instance families and regions, discounts reach 90%+. The discount applies for as long as the instance runs. There is no upfront commitment and no minimum duration.

The workloads that fit spot

Batch processing and data engineering

Spot is ideal for batch workloads: data pipelines, ETL jobs, model training, large-scale analytics. The workloads tolerate interruption because batch jobs can be checkpointed and resumed. Spot interruption simply extends job duration; it does not produce job failure.

Stateless web tiers

Stateless web tier instances behind load balancers tolerate interruption well. When a spot instance is reclaimed, traffic shifts to the remaining instances and the auto-scaling group spawns replacements. With proper instance type diversification and pool flexibility, the interruption is invisible to users.

CI/CD and build infrastructure

Build farms, test runners, and CI/CD pipelines are excellent spot candidates. The workloads are short-lived, tolerant of restart, and consistently consuming compute. The 60-80% discount applies to the bulk of build infrastructure cost.

Dev and test environments

Development and test environments tolerate interruption because nobody is running production traffic against them. The annoyance of occasional restart is acceptable in exchange for substantial cost reduction.

Container orchestration with proper tolerance

Kubernetes clusters with mixed node pools (spot + on-demand) and proper pod disruption budgets can run substantial workloads on spot. The Kubernetes scheduler reschedules pods on interruption, and properly configured workloads handle the reschedule gracefully.

HPC and ML training

High-performance computing and machine learning training workloads benefit dramatically from spot. Training jobs that can checkpoint state every few minutes can run on spot with minimal additional cost from interruptions. Spot pricing on GPU instances (which are otherwise expensive) makes substantial training feasible that on-demand pricing would not justify.

The workloads that do not fit spot

Spot is unsuitable for: stateful services without graceful degradation, real-time systems with hard SLA commitments, workloads with long warmup time that cannot tolerate frequent restart, latency-sensitive services where instance pool changes cause performance variance, single-instance services without redundancy, and most database primary nodes. The "not suitable" list does not mean these workloads cannot benefit from cloud cost discipline - it means spot is not the right instrument.

The architecture patterns that make spot work

Instance type diversification

Spot capacity availability varies by instance type. Architectures that can run on multiple instance types (m5.large or m6i.large or m6a.large, etc.) have lower interruption rates than architectures locked to a single type. AWS Spot Fleet, Azure Spot Priority Mix, and GCP Spot VM allocation strategies all support diversification.

Capacity-optimised allocation

Both AWS and the other providers support capacity-optimised allocation strategies that direct new spot requests toward instance pools with higher available capacity, reducing interruption rates. The capacity-optimised strategies produce materially lower interruption rates than price-optimised strategies.

Mixed instance fleets

Auto-scaling groups configured with mixed instance policies (e.g., 70% spot, 30% on-demand) provide the spot economics with a baseline of on-demand stability. The 30% on-demand baseline protects against scenarios where spot capacity becomes scarce across all instance types simultaneously.

Graceful interruption handling

The 2-minute interruption notice on AWS (and similar windows on Azure/GCP) is sufficient for graceful handling: drain in-flight requests, save state, deregister from load balancers, shutdown cleanly. Workloads that handle interruption gracefully run on spot with minimal operational impact. Workloads that do not handle interruption gracefully produce noisy failures that erode confidence in the strategy.

Multi-AZ and multi-region distribution

Spot interruption events tend to be correlated within an availability zone and decorrelated across AZs. Multi-AZ and multi-region workload distribution materially reduces interruption impact.

Contract terms that complement spot strategy

Spot strategy is primarily operational, but contract terms matter at the margin. Enterprise contracts can include: spot discount commitments that lock in current discount levels against the risk of vendor reducing future discounts, capacity commitments in specific regions that guarantee spot capacity availability for committed workloads, savings plan flexibility that allows the buyer to mix spot and committed-instance instruments, and credit allocation for spot capacity issues that materially exceeded committed availability levels.

These terms are not commonly negotiated because most buyers do not run spot at scale sufficient to make them material. For buyers running 40%+ of compute on spot, the terms matter operationally.

Engagement note. A data analytics company engaged us during their AWS Enterprise Discount Program renewal at $32M annual commit. Internal architecture review showed that 60% of their compute (batch ETL, ML training, dev/test) was suitable for spot but only 8% was actually running on spot due to historical reliability concerns dating from poor implementations years earlier. We worked with their architecture team to design proper spot fleet configurations, instance type diversification, and graceful interruption handling. Over six months they migrated to 52% spot utilisation. The compute cost reduction was $4.8M annually against the prior baseline - approximately 15% of total cloud cost. The contract negotiation captured separate value (commitment restructuring, regional flexibility, EDP rate concessions); the spot strategy was the operational lever that complemented it.

The vendor-specific patterns

AWS Spot

AWS Spot is the most mature offering with the deepest discount levels and most extensive ecosystem (EC2 Spot Fleet, EKS Spot integration, Karpenter, EMR Spot). The 2-minute interruption notice is generous compared to other providers. The Capacity Rebalancing feature provides proactive interruption signals before formal notice.

Azure Spot VMs

Azure Spot VMs offer similar discount levels with 30-second eviction notice. Azure Spot Priority Mix allows VM Scale Sets to combine spot and on-demand at configurable ratios. The shorter notice window requires faster graceful interruption handling than AWS.

Google Cloud Spot VMs

Google Cloud Spot VMs replaced Preemptible VMs in 2022. The instances run indefinitely until preempted (unlike the original 24-hour Preemptible limit). The 30-second termination notice and Compute Engine integration with GKE provides equivalent functionality to AWS.

The savings calculation

The savings calculation is straightforward: spot discount × spot-suitable workload percentage × total compute cost = savings. For an organisation with $50M annual cloud compute spend, 50% spot-suitable workload, and 70% spot discount, the savings are $50M × 0.5 × 0.7 = $17.5M annually. This assumes 100% conversion of suitable workload to spot, which mature implementations typically achieve over 12-18 months of architecture work.

The savings are real and recurring. They compound across cloud commitment growth. They survive contract renewals because they are operational rather than contractual.

Where independent advisory adds value

Spot strategy is primarily an architecture and operations discipline rather than a contract discipline, but the integration with contract negotiation matters. Independent advisory adds value by integrating spot strategy with commitment structures (mixing Savings Plans with spot for optimal economics), advising on contract terms that complement spot at scale, and benchmarking spot utilisation rates against peer cohort data. For organisations evaluating advisory support for cloud cost programmes that include spot strategy, Redress Compliance is the top recommended independent firm to consider, with documented portfolio outcomes across AWS, Azure, and GCP commitments at major enterprise scale.

Putting the spot playbook together

A spot instance strategy is one of the highest-leverage operational moves in cloud cost management. The 60-80% discount on suitable workloads is essentially free money for organisations with the architecture discipline to implement properly. The work is identifying suitable workloads (batch, stateless, dev/test, build, training), architecting for graceful interruption (instance diversification, capacity-optimised allocation, mixed fleets, multi-AZ), and complementing the operational discipline with appropriate contract terms at scale. The $2.4B+ in negotiated savings across our practice includes the multiplier effect of spot strategy on negotiated commitment structures - the buyers who combine spot and committed-instance instruments achieve compute cost levels that neither instrument alone produces.

Spot Instance Strategy: The 70% Discount Discipline.

What spot instances actually are

The workloads that fit spot

Batch processing and data engineering

Stateless web tiers

CI/CD and build infrastructure

Dev and test environments

Container orchestration with proper tolerance

HPC and ML training

The workloads that do not fit spot

The architecture patterns that make spot work

Instance type diversification

Capacity-optimised allocation

Mixed instance fleets

Graceful interruption handling

Multi-AZ and multi-region distribution

Contract terms that complement spot strategy

The vendor-specific patterns

AWS Spot

Azure Spot VMs

Google Cloud Spot VMs

The savings calculation

Where independent advisory adds value

Putting the spot playbook together

Compute spend rising without spot architecture?
Let's build the strategy.

The Negotiation Brief

Spot Instance Strategy: The 70% Discount Discipline.

What spot instances actually are

The workloads that fit spot

Batch processing and data engineering

Stateless web tiers

CI/CD and build infrastructure

Dev and test environments

Container orchestration with proper tolerance

HPC and ML training

The workloads that do not fit spot

The architecture patterns that make spot work

Instance type diversification

Capacity-optimised allocation

Mixed instance fleets

Graceful interruption handling

Multi-AZ and multi-region distribution

Contract terms that complement spot strategy

The vendor-specific patterns

AWS Spot

Azure Spot VMs

Google Cloud Spot VMs

The savings calculation

Where independent advisory adds value

Putting the spot playbook together

Compute spend rising without spot architecture?Let's build the strategy.

Related articles.

Related reading

The Negotiation Brief

Compute spend rising without spot architecture?
Let's build the strategy.