DEV Community

Dr. Malte Polley
Dr. Malte Polley

Posted on

Cloud Development Kit (CDK) Networking Deployment – Introduction and Overcoming a Defect in AWS Resource Access Management

Introduction: The Need for Network Operations Centers in AWS

When I first began creating AWS infrastructure, cross-account deployments were not common practice. However, over the past seven years, this landscape has changed significantly. Two primary reasons for distributing workloads across multiple accounts are the concepts of segregation of duties and the ability to exceed common AWS account API and resource limits.

During my time at AWS, I had the opportunity to design and implement Networking Operations Centers (NoCs). The purpose of a NoC is to centralize certain aspects of network design within an organization’s AWS cloud infrastructure. This approach is relatively straightforward to implement.

First, you should utilize the Transit Gateway as a centralized router for your networking needs. Next, create private Virtual Private Clouds (VPCs) and connect them to the Transit Gateway. You can set these up locally in each AWS account or distribute the VPCs using AWS Resource Access Manager (RAM) from your networking account to others. To manage IP addresses effectively and avoid overlapping CIDRs in a dynamic environment, consider using the Amazon VPC IP Manager, which allows for centralized CIDR assignment from a designated IP pool.

Additionally, you should deny the creation of local Internet Gateways in each account through Service Control Policies (SCPs). Implement a security VPC with a firewall of your choice, designed with a gateway load balancer. Finally, establish centralized egress and ingress Internet access VPCs, or integrate this functionality into your security VPC. For more detailed information, I highly recommend reading "Building a Scalable and Secure Multi-VPC AWS Network Infrastructure."

Simplified Network Diagram

To visualize our approach at MRH-Trowe, consider the following simplified network diagram. All resources are created in the networking account and shared across various business units. By adopting this method, we prevent changes to the underlying network infrastructure by business units, establish a robust compliance foundation, and simplify processes for application developers, who typically do not focus on network infrastructure.

MRH Trowe simplified Network Architecture

Automating Your Network Deployment: Leveraging the Cloud Development Kit (CDK)

Automation is crucial for creating and maintaining AWS workloads, especially when you need to build and tear down resources daily. The Cloud Development Kit (CDK) provides a programmatic approach to infrastructure management via AWS CloudFormation. While I won’t delve deeply into CDK basics, I encourage you to explore this tool if you haven't already—it's a game changer.

To create a layer-based approach for our NoC, we can structure our project with the following layers:

  1. IP Address Manager (ipam_stack.py): This manages the IP pool we want to control.
  2. Security VPC and Transit Gateway (security_vpc_stack.py): This layer handles security and routing.
  3. Gateway Load Balancer Based Firewall (firewall_stack.py): This ensures robust security measures.

After these foundational layers, we can add multiple VPCs with automated routable entries and Transit Gateway attachments.

network-operation-center/
|-app.py
|-stacks/
|--ipam_stack.py
|--security_vpc_stack.py
|--firewall_stack.py
|--vpc1_stack.py
…
|--vpc_n_stack.py
Enter fullscreen mode Exit fullscreen mode

While the first three stacks are somewhat specialized, we can leverage a more generic Python class for our VPCs, allowing for easier sharing later on. Below is an example of how to define a VPC construct tailored for MRH-Trowe's network:

from aws_cdk import (
    Aws,
    Tags,
    CfnTag,
    Duration,
    RemovalPolicy,
    aws_ec2 as ec2,
    aws_iam as iam,
    aws_ram as ram,
)
from stacks.security_vpc_stack import SecurityVPCStack
from constructs import Construct

class VPCDefinition(Construct):
    """Create a general-purpose VPC customized construct."""

    def __init__(
        self,
        scope: Construct,
        id: str,
        vpc_net_mask: int,
        vpc_name: str,
        default_cidr: str,
        transit_gateway_id: str,
        network_account_id: str,
        security_vpc_id: str = None,
        gwlb_param: str = None,
        ipv4_ipam_pool_id: str = None,
        max_azs: int = 2,
        amount_of_natgateways: int = 0,
        tgw_attach_cidr_mask: int = 28,
        public_service_cidr_mask: int = None,
        private_service_cidr_mask: int = None,
        isolated_service_cidr_mask: int = None,
        local_cidr: str = None,
        gwlb_service_name: str = None,
        tgw_attachement: bool = True,
    ) -> None:
        """Initialize CDK construct class."""
        super().__init__(scope, id)
        self.subnet_config = []
        self.tgw_subnet_ids = []
        self.tgw_route_table_ids = []
        self.tgw_azs = []
        self.id_collection = []
        self.vpc_net_mask = vpc_net_mask
        self.default_cidr = default_cidr
        self.transit_gateway_id = transit_gateway_id
        self.network_account_id = network_account_id
        self.ipv4_ipam_pool_id = ipv4_ipam_pool_id
        self.tgw_attach_cidr_mask = tgw_attach_cidr_mask
        self.public_service_cidr_mask = public_service_cidr_mask
        self.private_service_cidr_mask = private_service_cidr_mask
        self.isolated_service_cidr_mask = isolated_service_cidr_mask
        self.vpc_name = vpc_name
        self.max_azs = max_azs
        self.amount_of_natgateways = amount_of_natgateways
        self.local_cidr = local_cidr
        self.security_vpc_id = security_vpc_id
        self.tgw_attachement = tgw_attachement

        # Create a subnet set for the TGW
        if self.tgw_attach_cidr_mask == 28:
            self.subnet_config.append(
                {
                    "cidrMask": tgw_attach_cidr_mask,
                    "name": f"tgw-attachment-{self.vpc_name}",
                    "subnetType": ec2.SubnetType.PRIVATE_ISOLATED,
                    "MapPublicIpOnLaunch": False,
                }
            )
        # Additional subnet configurations can be added here as needed

        # Request a CIDR from IPAM
        ip_assignment = ec2.IpAddresses.aws_ipam_allocation(
            ipv4_ipam_pool_id=ipv4_ipam_pool_id,
            ipv4_netmask_length=self.vpc_net_mask,
        )
        # Create the actual VPC
        self.vpc = ec2.Vpc(
            self,
            id="VPC",
            ip_addresses=ip_assignment,
            max_azs=self.max_azs,
            nat_gateways=self.amount_of_natgateways,
            subnet_configuration=self.subnet_config,
            vpc_name=vpc_name,
            restrict_default_security_group=True,
        )
        # Tagging the VPC
        tags = {
            "Shared": "True",
            "SourceAccountId": self.network_account_id,
            "SourceAccountName": "Networking",
            "AWS Region": Aws.REGION,
        }
        for key, value in tags.items():
            Tags.of(self.vpc).add(key, value)

        # Collect subnet and route table IDs for automatic route table adoption
        if self.tgw_attach_cidr_mask == 28:
            selection_tgw_attach = self.vpc.select_subnets(
                subnet_group_name=f"tgw-attachment-{self.vpc_name}"
            )
            for i in selection_tgw_attach.subnets:
                self.id_collection.append(
                    {
                        "RouteTableId": i.route_table.route_table_id,
                        "Az": i.availability_zone,
                    }
                )
                self.tgw_subnet_ids.append(i.subnet_id)

        # Create Transit Gateway attachments and route table propagations
        transit_gateway_attachment = ec2.CfnTransitGatewayAttachment(
            self,
            id="TransitGatewayAttachment",
            subnet_ids=self.tgw_subnet_ids,
            transit_gateway_id=self.transit_gateway_id,
            vpc_id=self.vpc.vpc_id,
            tags=[CfnTag(key="Name", value=f"{vpc_name}-tgw-attachment")],
            options={"ApplianceModeSupport": "enable"},
        )
        # Create the TGW route table assocaition
        ec2.CfnTransitGatewayRouteTableAssociation(
            self,
            id="SpokeRouteTableAssociation",
            transit_gateway_attachment_id=transit_gateway_attachment.ref,
            transit_gateway_route_table_id=self.tgw_spoke_route_table_id,
        )
        # Create route for the each subnet towards the security VPC with 0.0.0.0/0
        for idx, i in enumerate(self.id_collection):
            ec2.CfnRoute(
                self,
                id=f"Route{idx}",
                route_table_id=i["RouteTableId"],
                destination_cidr_block=default_cidr,
                transit_gateway_id=transit_gateway_id,
            ).node.add_dependency(transit_gateway_attachment)
Enter fullscreen mode Exit fullscreen mode

These construct lifts a lot of work, but can be used easily later as the following examples show:

from constructs import Construct
from stacks.ipam_stack import IPAMStack
from stacks.security_vpc_stack import SecurityVPCStack
from stacks.network_creation import VPCDefinition
from aws_cdk import (
    Stack,
)

class SomeVPC(Stack):
    """Create the actual deployment in each AWS account.

    Args:
        Stage (Stage): cdk Class stage
    """

    def __init__(
        self, scope: Construct, construct_id: str, **kwargs
    ) -> None:
        """Intitialise CDK stack class."""
        super().__init__(scope, construct_id, **kwargs)
        """Create the actual CloudFormation stack."""
        self.vpc_name = "my-vpc"
        self.network_account_id = "12345678910"
        self.param_name_ipam_pool = "ipam-id"
        self.param_name_tgw_id = "tgw-id"
        self.default_cidr = "0.0.0.0/0"
        self.security_vpc_name = "my-sec-vpc"

        self.ipv4_ipam_pool_id = IPAMStack.return_pool_id(
            stack=self, pool_name=self.param_name_ipam_pool
        )
        self.transit_gateway_id = SecurityVPCStack.return_tgw_id(
            stack=self, param_name_tgw=self.param_name_tgw_id
        )

        vpc = VPCDefinition(
            self,
            id=self.vpc_name,
            transit_gateway_id=self.transit_gateway_id,
            network_account_id=self.network_account_id,
            ipv4_ipam_pool_id=self.ipv4_ipam_pool_id,
            default_cidr=self.default_cidr,
            vpc_name=self.vpc_name,
            vpc_net_mask=26,
            public_service_cidr_mask=28,
        )

       ram.CfnResourceShare(
            id="CfnResourceVPCShare",
            name=name_resource_share,
            allow_external_principals=False,
            principals="12345678911",
            resource_arns=["subnet_arn_1,subnet_arn_2, … subnet_arn_n"],
        )
Enter fullscreen mode Exit fullscreen mode

As you can see, we just need to pass some parameters and a new VPCs is born. With ram.CfnResourceShare(), we share the subnets via AWS Resource Access Manager with a certain target account. By sharing the subnets, the whole VPC will be shared, including route tables, routes etc.

Using the Network Resources: The Power of Tags

When utilizing network resources, the CDK offers a convenient method called ec2.Vpc.from_lookup(). This allows you to access all relevant information using the VPC ID. However, a significant issue arises: AWS Resource Access Manager does not share the tags created by the CDK for the network infrastructure components. This hinders the lookup functionality essential for effective resource management.

Sample subnet created by the CDK approach described in this blog post
aws-cdk:subnet-type:isolated and aws-cdk:subnet-name:private-service-mrht-teamviewer-vpc. These tags are crucial for the lookup functionality.

To mitigate this defect, we can leverage Custom Resources—AWS Lambda-backed components within our CDK app that gather tags in the networking account and replicate them to the destination account. The Lambda function requires a role in your networking account to describe route tables and subnets. We need to remove tags which starts with 'aws', otherwise, you will see this error: An error occurred (InvalidParameterValue) when calling the CreateTags operation: Value ( aws:cloudformation:stack-name ) for parameter key is invalid.

import json
import boto3
import os
import logging
import urllib3
from botocore.exceptions import ClientError, ParamValidationError

log_level = os.environ.get("LOG_LEVEL", "INFO")
logging.root.setLevel(logging.getLevelName(log_level))
logger = logging.getLogger(__name__)
http = urllib3.PoolManager()

ec2_client = boto3.client("ec2")
sts_client = boto3.client("sts")

def send(
    http,
    event,
    context,
    response_status,
    response_data,
    physical_resource_id=None,
    no_echo=False,
    reason="-",
):
    """Build CustomResource.

    Args:
        http (urllib3.PoolManager): object for put requests
        event (dict): Lambda event dict
        context (object): Lambda object
        response_status (string): result of this custom resource
        response_data (dict): additional data from this custom resource
        reason (string): error result of this custom resource
        physical_resource_id (string, optional): CloudFormation physical resource id. Defaults to None.
        no_echo (bool, optional): Echo mode activation. Defaults to False.

    Returns:
        No returns
    """
    response_url = event["ResponseURL"]
    logger.info(response_url)
    response_body = {}
    response_body["Status"] = response_status
    response_body["Reason"] = reason
    response_body["PhysicalResourceId"] = (
        physical_resource_id or context.log_stream_name
    )
    response_body["StackId"] = event["StackId"]
    response_body["RequestId"] = event["RequestId"]
    response_body["LogicalResourceId"] = event["LogicalResourceId"]
    response_body["NoEcho"] = no_echo
    response_body["Data"] = response_data

    json_response_body = json.dumps(response_body)
    logger.info("Response body:\n" + json_response_body)
    headers = {"content-type": "", "content-length": str(len(json_response_body))}

    try:
        response = http.request(
            "PUT",
            response_url,
            body=json_response_body.encode("utf-8"),
            headers=headers,
        )
        logger.info("Status code: " + response.reason)
        return True
    except Exception as e:
        logger.exception("send(..) failed executing requests.put(..): " + str(e))
        return False

def assume_role(role_arn: str, sts_client: boto3):
    """Assume IAM role in different Account.

    Args:
        sts_client (boto3, optional): boto3 object for STS. Defaults to sec_man_client.
        role_arn (str): SFTP Connection class. Defaults to SFTP.

    Raises:
        KeyError: KeyError. Missing environment variable.
        ClientError and ParamValidationError: ClientError. Boto3 Issue.

    Return:
        credentials: dict with tokens
    """
    try:
        response = sts_client.assume_role(
            RoleArn=role_arn, RoleSessionName="SyncSharedNetworkTags"
        )
    except (KeyError, ClientError, ParamValidationError) as e:
        raise e

    return response

def clean_tags(tags: list[dict]):
    """Remove tags which starts with 'aws'. An error occurred (InvalidParameterValue) when calling the CreateTags operation: 
    Value ( aws:cloudformation:stack-name ) for parameter key is invalid. Tag keys starting with 'aws:' are reserved for internal use.

    Params:
        tags (list[dict]):  tags gathered from describe calls

    Returns:
        tags: array with tags not starting with 'aws'
    """
    return [t for t in tags if not t['Key'].startswith('aws:')]

def main(event, context, sts_client=sts_client, source_ec2_client=ec2_client, http=http):
    """Create, delete and update custom actions.

    Params:
        sts_client: boto3 object for STS
        ec2_client: boto3 object for EC2
        http: http object
        event: Lambda event object
        context: Lambda context object
    """
    logger.info("Starting Network tag management ...")
    logger.info(event)
    logger.info(context)

    try:
        vpc_id = os.environ['VPC_ID']
        role_arn = os.environ['TARGET_IAM_ROLE_ARN']
    except KeyError as e:
        logger.exception(e)
        send(
            http=http,
            event=event,
            context=context,
            response_status="FAILED",
            response_data={"Response": str(e)},
            reason=str(e),
        )
        raise

    if event["RequestType"] == "Create" or event["RequestType"] == "Update":
        try:
            logger.info("Getting foreign credentials ...")
            target_credentials = assume_role(role_arn=role_arn, sts_client=sts_client)
            target_access_key = target_credentials["Credentials"]["AccessKeyId"]
            target_secret_access_key = target_credentials["Credentials"]["SecretAccessKey"]
            target_sessions_token = target_credentials["Credentials"]["SessionToken"]
        except (KeyError, ClientError, ParamValidationError) as e:
            logger.exception(e)
            send(
                http=http,
                event=event,
                context=context,
                response_status="FAILED",
                response_data={"Response": str(e)},
                reason=str(e),
            )
            raise

        logger.info("Creating foreign EC2 client ...")
        target_ec2_client = boto3.client(
            'ec2', 
            aws_access_key_id=target_access_key,
            aws_secret_access_key=target_secret_access_key,
            aws_session_token=target_sessions_token
        )
        logger.info("Done ...")

        try:
            logger.info("Describing local VPC tags ...")
            vpc_tags = source_ec2_client.describe_vpcs(VpcIds=[vpc_id])['Vpcs'][0]['Tags']
            logger.info("Copying local tags to shared VPC in foreign account ...")
            target_ec2_client.create_tags(Resources=[vpc_id], Tags=clean_tags(vpc_tags))
            logger.info("Done ...")

            logger.info("Describing local Subnet tags ...")
            subnets = source_ec2_client.describe_subnets(Filters=[{'Name': 'vpc-id', 'Values': [vpc_id]}])['Subnets']
            for subnet in subnets:
                subnet_id = subnet['SubnetId']
                subnet_tags = clean_tags(subnet['Tags'])
                logger.info("Copying local tags to shared Subnets in foreign account ...")
                logger.info(subnet_id)
                logger.info(subnet_tags)
                target_ec2_client.create_tags(Resources=[subnet_id], Tags=subnet_tags)
                logger.info(f"Done for subnet {subnet_id} ...")

            logger.info("Describing local Route Tables tags ...")
            route_tables = source_ec2_client.describe_route_tables(Filters=[{'Name': 'vpc-id', 'Values': [vpc_id]}])['RouteTables']
            logger.info(route_tables)
            for route_table in route_tables:
                route_table_id = route_table['RouteTableId']
                route_table_tags = clean_tags(route_table['Tags'])
                logger.info("Copying local tags to shared Route Tables with in foreign account ...")
                logger.info(route_table_id)
                logger.info(route_table_tags)
                if len(route_table_tags) > 0:
                    target_ec2_client.create_tags(Resources=[route_table_id], Tags=route_table_tags)
                else:
                    logging.info(f"Found Route Table {route_table_id} without tags ...")
                logger.info(f"Done for route table {route_table_id} ...")
        except (KeyError, ClientError, TypeError) as e:
            logger.exception(e)
            send(
                http=http,
                event=event,
                context=context,
                response_status="FAILED",
                response_data={"Response": str(e)},
                reason=str(e),
            )
            raise
        logger.info(f"Finished Network tag management on VPC {vpc_id} ...")
        send(
            http=http,
            event=event,
            context=context,
            response_status="SUCCESS",
            response_data={"Response": f"Finished Network tag management on VPC {vpc_id}"},
        )
        return True

    if event["RequestType"] == "Delete":
        logger.info(f"Deleting process in progress")
        send(
            http=http,
            event=event,
            context=context,
            response_status="SUCCESS",
            response_data={"Response": "Deleting process in progress"},
        )
        return True
Enter fullscreen mode Exit fullscreen mode

Conclusion: A Secure and Scalable Network

In summary, this blog post illustrates how MRH-Trowe successfully created a robust AWS network infrastructure using a CDK-based approach. While the transition to a programmatic model may feel unusual for networking professionals accustomed to scripting or tools like Ansible, the benefits of a segregated duties approach are undeniable.

By investing time and effort into this project, we have established a secure and scalable network that empowers non-network-related teams to utilize existing resources without concern.

Happy coding!

Top comments (0)