Improve spot HA by utilising ASG capacity rebalance #11

kieranbrown · 2024-01-23T12:53:00Z

Capacity rebalance helps by being proactive in trying to replace Spot Instances before they are interrupted. Full docs - https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-capacity-rebalancing.html

Current Behaviour
When a spot instance receives its 2-minute interruption warning nothing happens, the instance is terminated after 2 minutes then a new instance is started after the original is terminated.

New Behaviour
When a spot instance receives its 2-minute interruption warning the ASG immediately provisions a new instance which hopefully will boot and move the floating EIP before the original is terminated. With this approach, there is minimal downtime when spot instances are terminated.

You can test this out using AWS Fault Injection, if you go to the EC2 management console then click Spot Requests in the sidebar. You have the option to select a spot request, click actions then Initiate Interruption

gabrieleolmi · 2024-05-21T16:42:12Z

The only thing I would change is the ha_additional_instance_types variable, would it be better to use an empty array by default? Otherwise, after an update, users would have an ASG with a behavior and an instance that they did not choose.

This pull request should fix the problems I'm having while using fck-nat. My spot instance often fails to start because there is no spot capacity available. For example these are the errors that often happen to me when using the fck-nat instance:

kieranbrown · 2024-05-22T21:00:40Z

@gabrieleolmi

The only thing I would change is the ha_additional_instance_types variable, would it be better to use an empty array by default? Otherwise, after an update, users would have an ASG with a behavior and an instance that they did not choose.

It's been a while since I last looked at this but IIRC capacity rebalance requires a mixed instance policy and within a mixed instance policy you need to define a minimum of 2 instance types. Adding the ha_additional_instance_types and defaulting it to the next cheapest instance type was the only sensible approach I could think of.

Defaulting ha_additional_instance_types to an empty array would cause an error if end users set use_spot_instances = true without explicitly setting ha_additional_instance_types to their preferred failover instance.

Perhaps just some documentation to clear up this behaviour in the README would be enough.

@RaJiska it would be good to hear your thoughts on this.

kieranbrown added 3 commits January 23, 2024 12:45

Improve spot HA by utilising ASG capacity rebalance

c9c5483

Fix instance not recreating when switching lifecycle

a87c17d

Update docs

3f52571

Merge branch 'main' into capacity-rebalance

51b51af

kieranbrown closed this Aug 6, 2024

kieranbrown deleted the capacity-rebalance branch August 6, 2024 16:12

kieranbrown restored the capacity-rebalance branch August 6, 2024 16:16

kieranbrown reopened this Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve spot HA by utilising ASG capacity rebalance #11

Improve spot HA by utilising ASG capacity rebalance #11

kieranbrown commented Jan 23, 2024 •

edited

Loading

gabrieleolmi commented May 21, 2024

kieranbrown commented May 22, 2024 •

edited

Loading

Improve spot HA by utilising ASG capacity rebalance #11

Are you sure you want to change the base?

Improve spot HA by utilising ASG capacity rebalance #11

Conversation

kieranbrown commented Jan 23, 2024 • edited Loading

gabrieleolmi commented May 21, 2024

kieranbrown commented May 22, 2024 • edited Loading

kieranbrown commented Jan 23, 2024 •

edited

Loading

kieranbrown commented May 22, 2024 •

edited

Loading