Run EKS Nodes on EC2 Spot Instances
AWS offers managed node groups to provision worker nodes in an EKS cluster. You get a lot of benefits:
- AWS manages the underlying Auto Scaling group and EC2 instances.
- The Auto Scaling group will span all the subnets and availability zones you've specified.
- Automated security.
- The nodes use an AMI managed by AWS and they manage CVEs and security patches (you are responsible for deploying patched AMIs).
As of August 2020, managed node groups do not support spot instances. So if you want to run Kubernetes workloads on the cheap (up to a 90% discount with spot instances), you'll have to roll your own. Using Terraform can make this as easy as using managed node groups, but requires a little more setup.
Terraform
I assume you already have a VPC and EKS cluster running. Setting up an EKS cluster is simple in Terraform:
resource "aws_eks_cluster" "main" {
name = var.cluster_name
role_arn = var.cluster_role_arn
vpc_config {
subnet_ids = var.cluster_subnet_ids
}
}
See the Terraform documentation for complete configuration options.
Launch Template
First we need a launch template, which will contain our configurations when starting instances. We use the aws_ssm_parameter
data source to get the AWS EKS optimized AMI. It is important that we configure our instances to have the tag kubernetes.io/cluster/${var.cluster_name}
= owned
. The EKS control plane uses this tag to discover workload nodes.
data "aws_ssm_parameter" "eks_ami" {
name = "/aws/service/eks/optimized-ami/${var.kubernetes_version}/amazon-linux-2/recommended/image_id"
}
resource "aws_launch_template" "eks_node_group" {
name_prefix = "${var.name}-"
description = var.description
block_device_mappings {
device_name = "/dev/sda1" // the root volume
ebs {
volume_size = var.root_volume_size
delete_on_termination = true
}
}
iam_instance_profile {
name = var.node_group_instance_profile
}
image_id = data.aws_ssm_parameter.eks_ami.value
instance_type = var.instance_type
// https://amzn.to/34ajjzB
// https://bit.ly/3iEKh6h
user_data = base64encode(templatefile("${path.module}/user_data.sh.tpl", { cluster_name = var.cluster_name }))
vpc_security_group_ids = data.aws_security_groups.main.ids
dynamic "instance_market_options" {
for_each = var.use_spot_instances ? [1] : []
content {
market_type = "spot"
}
}
monitoring {
enabled = true
}
tag_specifications {
resource_type = "instance"
tags = merge(
var.tags,
{ "kubernetes.io/cluster/${var.cluster_name}" = "owned" }
)
}
tags = var.tags
}