Niko Kiirala

Posted on Dec 12

NATing on the cheap on AWS

#aws #network #devops #terraform

Let's consider this case: you have something running in AWS and that something needs occasional access to public internet. You can either give that something public IP addresses or there needs to be a NAT between your private network and the public internet. Let's consider this second case, setting up a NAT.

In my use case, specifically, I'm running an EC2 instance that has a small web server on it. I also have a CodePipeline + CodeBuild setup that builds and deploys a new version of the server whenever the production branch in the Git repo is pushed to. This CodePipeline + CodeBuild setup is excellently suited for running in a private network as there's no point in having it visible from the public internet. It does, however, need to fetch stuff from the public internet - get dependencies from package repositories and such. Thus I need to have a NAT gateway, but I have no need for all the bells and whistles of AWS-provided NAT gateways. Not to mention, I have just a small budget as well.

Let me be clear on one thing from the get-go: AWS-provided NAT gateways work great. Fast, stable, simple to use - all sorts of things you'd want for production infrastructure. Also I'd like to give a shout-out to alterNAT that is a production-grade replacement for NAT gateways, and from whom I've copied bits and pieces for my version.

In case you have some testing environment, low-usage servers or such, a NAT gateway can get costly. The gateway has a base fee, a per-gigabyte fee for traffic and you need to pay for a public IP, too. These can easily end up costing more than the rest of your usage - especially so if you qualify for and use free tier services.

If you're already running an EC2 instance with a public IP, it's possible to set up a NAT for no cost. Running a low-spec EC2 instance just for this purpose is fairly cheap, too. The solution is a NAT instance.

Set up the EC2 instance

You can use an EC2 instance you already have or set up a new one just for the purpose. This instance needs three important settings:

A public IP address: the NAT gateway needs to be in a public subnet, with an Internet Gateway attached, and it needs to have a public IP address. All traffic from your internal subnets will appear to originate from this IP.
Source/destination check disabled: by default, EC2 instances can only see their own traffic. Incoming traffic where the target IP is not one of the instance's own IPs is filtered out, as is outgoing traffic where source IP doesn't belong to the instance. However, when the instance is doing routing, we need it to do both, so we need to disable the check.
User data blob: this contains the necessary instructions to set up the NAT functionality at every boot. The user data blob gets processed by cloud-init that's active by default in the Amazon Linux 2023 AMI that I'm using.

I'll be using Terraform to set up my instance.

data "aws_ami" "arm64-ecs" {
  owners      = ["amazon"]
  name_regex  = "^al2023-ami-ecs"
  most_recent = true
  filter {
    name   = "architecture"
    values = ["arm64"]
  }
}

resource "aws_instance" "arm64_machine" {
  ami           = data.aws_ami.arm64-ecs.image_id
  instance_type = "t4g.small"

  availability_zone           = data.aws_availability_zones.main.names[0]
  iam_instance_profile        = aws_iam_instance_profile.machine_profile.name
  key_name                    = aws_key_pair.machine_access.key_name
  subnet_id                   = aws_subnet.main[0].id
  vpc_security_group_ids      = [aws_security_group.ecs_on_ec2.id]
  associate_public_ip_address = true
  source_dest_check           = false

  user_data                   = local.instance_cloudconfig
  user_data_replace_on_change = true

  tags = merge(local.common_tags, {
    Name             = "${local.service_name}-arm64"
  })
}

The three important bits described above are already visible in this code snippet:

associate_public_ip_address = true
source_dest_check = false
user_data = local.instance_cloudconfig

User data blob

I could build a new AMI that has the necessary bits and pieces to set up the instance to perform NAT at every boot. That's a fairly heavy process, though, so instead of that I'm using user data to drop a small script and a few configuration files to the instance at boot.

locals {
  instance_cloudconfig_data = {
    write_files = [
      {
        path = "/var/lib/cloud/scripts/per-boot/tinynat.sh"
        permissions = "0744"
        owner       = "root:root"
        content     = file("initscript/tinynat.sh")
      },
      {
        path        = "/etc/tinynat.conf"
        permissions = "0644"
        owner       = "root:root"
        content     = file("initscript/tinynat.conf")
      },
      {
        path        = "/etc/tinynat-route-table-ids.csv"
        permissions = "0644"
        owner       = "root:root"
        content     = aws_route_table.private.id
      },
      {
        path        = "/etc/tinynat-private-nets.csv"
        permissions = "0644"
        owner       = "root:root"
        content     = join(",", aws_subnet.private[*].cidr_block)
      },
    ]
  }
  instance_cloudconfig = <<-END
    #cloud-config
    ${jsonencode(local.instance_cloudconfig_data)}
  END
}

The instance_cloudconfig_data is a Terraform object, containing cloud-init instructions. Here, we drop four files in the file system: the setup script, a config file containing IDs of route tables belonging to private networks (likely just one ID), a config file containing IP address ranges of private networks that should get routed and finally a third config file that tells where to find the two other config files.

The setup script is stored under /var/lib/cloud/scripts/per-boot so that it runs every time the instance is booted up. It's making iptables changes that are not persisted across power cycles, so it needs to get run at every boot.

This script, initscript/tinynat.sh is as follows:

#!/bin/bash

# Based on alterNAT
# https://github.com/chime/terraform-aws-alternat/blob/main/scripts/alternat.sh

# Send output to a file and to the console
# Credit to the alestic blog for this one-liner
# https://alestic.com/2010/12/ec2-user-data-output/
exec > >(tee /var/log/alternat.log|logger -t user-data -s 2>/dev/console) 2>&1

shopt -s expand_aliases

panic() {
  [ -n "$1" ] && echo "$1"
  echo "tinyNAT setup failed"
  exit 1
}

load_config() {
   if [ -f "$CONFIG_FILE" ]; then
      . "$CONFIG_FILE"
   else
      panic "Config file $CONFIG_FILE not found"
   fi
   validate_var "route_table_ids_csv" "$route_table_ids_csv"
   validate_var "private_net_cidr_ranges" "$private_net_cidr_ranges"
}

validate_var() {
   var_name="$1"
   var_val="$2"
   if [ ! "$2" ]; then
      echo "Config var \"$var_name\" is unset"
      exit 1
   fi
}

# configure_nat() sets up Linux to act as a NAT device.
# See https://docs.aws.amazon.com/vpc/latest/userguide/VPC_NAT_Instance.html#NATInstance
configure_nat() {
   echo "Beginning NAT configuration"

   local adapter=ens5

   echo "Configuration before enabling NAT:"
   sysctl net.ipv4.ip_forward net.ipv4.conf.${adapter}.send_redirects net.ipv4.ip_local_port_range
   echo "iptables default"
   iptables -n -v -L
   echo "iptables NAT"
   iptables -n -v -t nat -L

   echo "Enabling NAT..."

   IFS=',' read -r -a vpc_cidrs < "${private_net_cidr_ranges}"
   for cidr in "${vpc_cidrs[@]}";
   do
      echo "Adding routing for private network ${cidr}"
      (iptables -t nat -C POSTROUTING -o ${adapter} -s "$cidr" -j MASQUERADE 2>/dev/null ||
      iptables -t nat -A POSTROUTING -o ${adapter} -s "$cidr" -j MASQUERADE) ||
      panic
   done

   iptables -N DOCKER-USER
   iptables -C DOCKER-USER -i ${adapter} -o ${adapter} -j ACCEPT 2>/dev/null ||
      iptables -I DOCKER-USER -i ${adapter} -o ${adapter} -j ACCEPT || panic

   iptables -n -v -L DOCKER-USER
   iptables -n -v -t nat -L POSTROUTING

   echo "NAT configuration complete"
}

# First try to replace an existing route
# If no route exists already (e.g. first time set up) then create the route.
configure_route_table() {
   echo "Configuring route tables"

   IFS=',' read -r -a route_table_ids < "${route_table_ids_csv}"

   for route_table_id in "${route_table_ids[@]}"
   do
      echo "Attempting to find route table $route_table_id"
      local rtb_id=$(aws ec2 describe-route-tables --filters Name=route-table-id,Values=${route_table_id} --query 'RouteTables[0].RouteTableId' | tr -d '"')
      if [ -z "$rtb_id" ]; then
         panic "Unable to find route table $rtb_id"
      fi

      echo "Found route table $rtb_id"
      echo "Replacing route to 0.0.0.0/0 for $rtb_id"
      aws ec2 replace-route --route-table-id "$rtb_id" --instance-id "$INSTANCE_ID" --destination-cidr-block 0.0.0.0/0
      if [ $? -eq 0 ]; then
         echo "Successfully replaced route to 0.0.0.0/0 via instance $INSTANCE_ID for route table $rtb_id"
         continue
      fi

      echo "Unable to replace route. Attempting to create route"
      aws ec2 create-route --route-table-id "$rtb_id" --instance-id "$INSTANCE_ID" --destination-cidr-block 0.0.0.0/0
      if [ $? -eq 0 ]; then
         echo "Successfully created route to 0.0.0.0/0 via instance $INSTANCE_ID for route table $rtb_id"
      else
         panic "Unable to replace or create the route!"
      fi
   done
}

# tinynatconfig file containing inputs needed for initialization
CONFIG_FILE="/etc/tinynat.conf"

load_config

curl_cmd="curl --silent --fail"

echo "Requesting IMDSv2 token"
token=$($curl_cmd -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 900")
alias CURL_WITH_TOKEN="$curl_cmd -H \"X-aws-ec2-metadata-token: $token\""

# Set CLI Output to text
export AWS_DEFAULT_OUTPUT="text"

# Disable pager output
# https://docs.aws.amazon.com/cli/latest/userguide/cli-usage-pagination.html#cli-usage-pagination-clientside
# This is not needed in aws cli v1 which is installed on the current version of Amazon Linux 2.
# However, it may be needed to prevent breakage if they update to cli v2 in the future.
export AWS_PAGER=""

# Set Instance Identity URI
II_URI="http://169.254.169.254/latest/dynamic/instance-identity/document"

# Retrieve the instance ID
INSTANCE_ID=$(CURL_WITH_TOKEN $II_URI | grep instanceId | awk -F\" '{print $4}')

# Set region of NAT instance
export AWS_DEFAULT_REGION=$(CURL_WITH_TOKEN $II_URI | grep region | awk -F\" '{print $4}')

echo "Beginning self-managed NAT configuration"
configure_nat
configure_route_table
echo "Configuration completed successfully!"

And the configuration file initscript/tinynat.conf is simple two-line affair.

route_table_ids_csv=/etc/tinynat-route-table-ids.csv
private_net_cidr_ranges=/etc/tinynat-private-nets.csv

Permissions setup

The init script reads info about the route table of your internal network and modifies it so that the NAT instance is the default gateway of the internal network. In order for it to do that, the EC2 instance needs an IAM policy that allows those two actions.

data "aws_iam_policy_document" "tinynat_ec2_policy" {
  statement {
    sid    = "tinynatDescribeRoutePermissions"
    effect = "Allow"
    actions = [
      "ec2:DescribeRouteTables"
    ]
    resources = ["*"]
  }

  statement {
    sid    = "tinynatModifyRoutePermissions"
    effect = "Allow"
    actions = [
      "ec2:CreateRoute",
      "ec2:ReplaceRoute"
    ]
    resources = [
      for route_table in [aws_route_table.private.id]
      : "arn:aws:ec2:${data.aws_region.current.name}:${data.aws_caller_identity.current.id}:route-table/${route_table}"
    ]
  }
}

resource "aws_iam_role_policy" "tinynat_ec2" {
  name   = "tinynat-policy"
  policy = data.aws_iam_policy_document.tinynat_ec2_policy.json
  role   = aws_iam_role.machine_role.name
}

You'll also need to verify that your security groups allow instances in your private network to contact the NAT instance, using any protocols and ports they may need for their internet-facing traffic. Also, the NAT instance needs to be allowed to receive such traffic and to send it to public internet. Quite likely, said traffic will be TCP to port 443, aka. HTTPS connections.

Wrap-up

With all this set up, servers in your internal network should now be able to contact the public internet.

When viewing the setup through AWS web UI, the resource map of your VPC should show that your private networks are connected to the routing table you specified in local.instance_cloudconfig_data, but it won't show where that table is routing the traffic to. Viewing the route table, you should see a route with destination 0.0.0.0/0 i.e. the default route with an ENI as its target. Finally, viewing that ENI, under Instance Details there should be a link to your NAT instance.

This, I hope, makes running small-time applications and various under AWS more approachable. For me, at least, this approach provided me major cost savings for running a tiny application, since I could make my existing EC2 instance double as a NAT gateway instead of running one separately.

DEV Community

NATing on the cheap on AWS

Set up the EC2 instance

User data blob

Permissions setup

Wrap-up

Top comments (0)

Read next

Installing Python Dependencies on AWS Lambda Using EFS

Optimizing Your Amazon Web Services Email Address: A Comprehensive Guide

Understanding RabbitMQ Brokers in AWS

Azure Cloud Computing Platform: Unlocking Endless Possibilities 🌍