DEV Community

Cover image for Combining IAM Roles for Service Accounts with Pod level Security Groups for a defense-in-depth strategy
Chabane R. for Stack Labs

Posted on • Edited on

Combining IAM Roles for Service Accounts with Pod level Security Groups for a defense-in-depth strategy

In the previous part we created our RDS instance. In this part, we'll put them all together and deploy the metabase to Kubernetes. Our objective is to:

  • Enable IAM roles for Service Account.
  • Create an IAM role to connect to the RDS instance. It will be added to the metabase service account.
  • Enable Pod Security Group by adding the managed policy AmazonEKSVPCResourceController on Amazon EKS cluster.
  • Create a security group that allows inbound traffic to RDS. It will be assigned to the metabase service account.
  • Upgrade the VPC CNI to the latest version. Version +1.7.7 is required to enable Pod Security Group in the EKS Cluster.
  • Enabling POD ENI in the aws-node daemonset.
  • Deploy and test our Kubernetes manifests.

Alt Text

Enabling IAM roles for Service Account

To assign an IAM role to a pod, we need:

  • To create an IAM OIDC provider for the cluster. The cluster has an OpenID Connect issuer URL associated with it.
  • To create the IAM role and attach an IAM policy to it with the rds-db:connect permission that the service account needs:

Complete infra/plan/eks-cluster.tf with:




data "tls_certificate" "cert" {
  url = aws_eks_cluster.eks.identity[0].oidc[0].issuer
}

resource "aws_iam_openid_connect_provider" "openid" {
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = [data.tls_certificate.cert.certificates[0].sha1_fingerprint]
  url             = aws_eks_cluster.eks.identity[0].oidc[0].issuer
}

data "aws_iam_policy_document" "web_identity_assume_role_policy" {
  statement {
    actions = ["sts:AssumeRoleWithWebIdentity"]
    effect  = "Allow"

    condition {
      test     = "StringEquals"
      variable = "${replace(aws_iam_openid_connect_provider.openid.url, "https://", "")}:sub"
      values   = ["system:serviceaccount:metabase:metabase"]
    }

    condition {
      test     = "StringEquals"
      variable = "${replace(aws_iam_openid_connect_provider.openid.url, "https://", "")}:aud"
      values   = ["sts.amazonaws.com"]
    }

    principals {
      identifiers = [aws_iam_openid_connect_provider.openid.arn]
      type        = "Federated"
    }
  }
}

resource "aws_iam_role" "web_identity_role" {
  assume_role_policy = data.aws_iam_policy_document.web_identity_assume_role_policy.json
  name               = "web-identity-role-${var.env}"
}


Enter fullscreen mode Exit fullscreen mode

By combining the OpenID Connect (OIDC) identity provider and Kubernetes service account annotations, we will be able use IAM roles at the pod level.

Inside EKS, there is an admission controller that will inject AWS session credentials into pods respectively of the roles based on the annotation on the Service Account used by the pod. The credentials will get exposed by AWS_ROLE_ARN & AWS_WEB_IDENTITY_TOKEN_FILE environment variables. [3]

For a detailed explanation of this capability, see the [introducing fine-grained IAM roles for service accounts][aws-7]

Now we can create the IAM role to allow access to RDS instance from Kubernetes pods:

Complete infra/plan/eks-cluster.tf with:



resource "aws_iam_role_policy" "rds_access_from_k8s_pods" {
  name = "rds-access-from-k8s-pods-${var.env}"
  role = aws_iam_role.web_identity_role.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = [
          "rds-db:connect",
        ]
        Effect   = "Allow"
        Resource = "arn:aws:rds-db:${var.region}:${data.aws_caller_identity.current.account_id}:dbuser:${aws_db_instance.postgresql.resource_id}/metabase"
      }
    ]
  })
}


Enter fullscreen mode Exit fullscreen mode

Pod Security Group

To enable Pod security group, we need to add the managed policy AmazonEKSVPCResourceController. It allows the role to manage network interfaces, their private IP addresses, and their attachment and detachment to and from instances.

Complete infra/plan/eks-cluster.tf with:



resource "aws_iam_role_policy_attachment" "eks-AmazonEKSVPCResourceController" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSVPCResourceController"
  role       = aws_iam_role.eks.name
}


Enter fullscreen mode Exit fullscreen mode

Now let's create our pod security group

Complete infra/plan/eks-node-group.tf with:




resource "aws_security_group" "rds_access" {
    name        = "rds-access-from-pod-${var.env}"
    description = "Allow RDS Access from Kubernetes Pods"
    vpc_id      = aws_vpc.main.id

    ingress {
        from_port = 3000
        to_port   = 3000
        protocol  = "tcp"
        self      = true
    }

    ingress {
        from_port       = 53
        to_port         = 53
        protocol        = "tcp"
        security_groups = [aws_eks_cluster.eks.vpc_config[0].cluster_security_group_id]
    }

    ingress {
        from_port       = 53
        to_port         = 53
        protocol        = "udp"
        security_groups = [aws_eks_cluster.eks.vpc_config[0].cluster_security_group_id]
    }

    egress {
        from_port   = 0
        to_port     = 0
        protocol    = "-1"
        cidr_blocks = ["0.0.0.0/0"]
    }

    tags = {
        Name        = "rds-access-from-pod-${var.env}"
        Environment = var.env
    }
}


Enter fullscreen mode Exit fullscreen mode

To allow the pod to access the Amazon RDS instance, we need to allow the pod security group as the source of inbound / outbound traffic on the RDS port.

Update the VPC security group aws_security_group.sg in infra/plan/rds.tf with the following ingress / egress rules:




  ingress {
    from_port       = var.rds_port
    to_port         = var.rds_port
    protocol        = "tcp"
    security_groups = [aws_security_group.rds_access.id]
  }

  egress {
    from_port       = 1025
    to_port         = 65535
    protocol        = "tcp"
    security_groups = [aws_security_group.rds_access.id]
  }


Enter fullscreen mode Exit fullscreen mode

Add the following outputs:



output "sg-eks-cluster" {
    value = aws_eks_cluster.eks.vpc_config[0].cluster_security_group_id
}

output "sg-rds-access" {
    value = aws_security_group.rds_access.id
}


Enter fullscreen mode Exit fullscreen mode

Let's deploy our modifications



cd infra/envs/dev

terraform apply ../../plan/ 


Enter fullscreen mode Exit fullscreen mode

Kubernetes configuration

Let's connect to EKS cluster



aws eks --region $REGION update-kubeconfig --name $EKS_CLUSTER_NAME


Enter fullscreen mode Exit fullscreen mode

Now we need to enable pods to receive their own network interfaces. Before doing that, use the following command to print your cluster's CNI version:



kubectl describe daemonset aws-node --namespace kube-system | grep Image | cut -d "/" -f 2


Enter fullscreen mode Exit fullscreen mode

The Amazon EKS cluster must be running Kubernetes version 1.17 and Amazon EKS platform version eks.3 or later.

Upgrade your CNI version [1]



curl -o aws-k8s-cni.yaml https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/v1.7.9/config/v1.7/aws-k8s-cni.yaml
sed -i "s/us-west-2/$REGION/g" aws-k8s-cni.yaml
kubectl apply -f aws-k8s-cni.yaml


Enter fullscreen mode Exit fullscreen mode

Enable the CNI plugin to manage network interfaces for pods by setting the ENABLE_POD_ENI variable to true in the aws-node DaemonSet. Once this setting is set to true, for each node in the cluster the plugin adds a label with the value vpc.amazonaws.com/has-trunk-attached=true. The VPC resource controller creates and attaches one special network interface called a trunk network interface with the description aws-k8s-trunk-eni [2].



kubectl set env daemonset -n kube-system aws-node ENABLE_POD_ENI=true


Enter fullscreen mode Exit fullscreen mode

You can see which of your nodes have aws-k8s-trunk-eni set to true with the following command.



$ kubectl get nodes -o wide -l vpc.amazonaws.com/has-trunk-attached=true

NAME                                       STATUS   ROLES    AGE   VERSION              INTERNAL-IP   EXTERNAL-IP     OS-IMAGE         KERNEL-VERSION                  CONTAINER-RUNTIME
ip-10-0-3-109.eu-west-1.compute.internal   Ready    <none>   56m   v1.18.9-eks-d1db3c   10.0.3.109    <none>          Amazon Linux 2   4.14.219-164.354.amzn2.x86_64   docker://19.3.13
ip-10-0-7-157.eu-west-1.compute.internal   Ready    <none>   56m   v1.18.9-eks-d1db3c   10.0.7.157    34.253.89.183   Amazon Linux 2   4.14.219-164.354.amzn2.x86_64   docker://19.3.13


Enter fullscreen mode Exit fullscreen mode

Testing metabase connection to the RDS Instance

We deploy our k8s manifests using Kustomize. Add the following manifests in the folder config/base

config/base/service-account.yaml



apiVersion: v1
kind: ServiceAccount
metadata:
  labels: 
    app: metabase
  name: metabase


Enter fullscreen mode Exit fullscreen mode

config/base/security-group-policy.yaml



apiVersion: vpcresources.k8s.aws/v1beta1
kind: SecurityGroupPolicy
metadata:
  name: metabase
spec:
  serviceAccountSelector: 
    matchLabels: 
      app: metabase


Enter fullscreen mode Exit fullscreen mode

config/base/database-secret.yaml



apiVersion: v1
kind: Secret
metadata:
  name: metabase
type: Opaque
data:
  password: metabase


Enter fullscreen mode Exit fullscreen mode

config/base/deployment.yaml



apiVersion: apps/v1
kind: Deployment
metadata:
  name: metabase
  labels:
    app: metabase
spec:
  selector:
    matchLabels:
      app: metabase
  replicas: 1
  template:
    metadata:
      labels:
        app: metabase
    spec:
      containers:
        - name: metabase
          image: metabase/metabase
          imagePullPolicy: IfNotPresent
          resources:
             requests:
               memory: "1Gi"
               cpu: "512m"
             limits:
               memory: "4Gi"
               cpu: "2000m"
          livenessProbe:
            httpGet:
              path: /api/health
              port: 3000
            initialDelaySeconds: 100
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /api/health
              port: 3000
            initialDelaySeconds: 60
            periodSeconds: 10


Enter fullscreen mode Exit fullscreen mode

config/base/service.yaml



apiVersion: v1
kind: Service
metadata:
  name: metabase
  labels:
    app: metabase
spec:
  type: LoadBalancer
  ports:
    - port: 8000
      targetPort: 3000
      protocol: TCP
  selector:
    app: metabase


Enter fullscreen mode Exit fullscreen mode

And finally our config/base/kustomization.yaml file



apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: metabase

resources:
- security-group-policy.yaml
- service-account.yaml
- deployment.yaml
- service.yaml
- database-secret.yaml


Enter fullscreen mode Exit fullscreen mode

Now we have our kustomize base, we can patch the manifests with the values provided as terraform outputs.

Create config/envs/$ENV/service-account.patch.yaml. We annotate the service account with the IAM role created before for RDS access.



apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: <RDS_ACCESS_ROLE_ARN>
  labels: 
    app: metabase
  name: metabase


Enter fullscreen mode Exit fullscreen mode

Create config/envs/$ENV/security-group-policy.patch.yaml.

The SecurityGroupPolicy CRD specifies which security groups to assign to pods. Within a namespace, we can select pods based on pod labels, or based on labels of the service account associated with a pod. We define the security group IDs to be applied.



apiVersion: vpcresources.k8s.aws/v1beta1
kind: SecurityGroupPolicy
metadata:
  name: metabase
spec:
  serviceAccountSelector: 
    matchLabels: 
      app: metabase
  securityGroups:
    groupIds: 
      - <POD_SECURITY_GROUP_ID>
      - <EKS_CLUSTER_SECURITY_GROUP_ID>


Enter fullscreen mode Exit fullscreen mode

Create config/envs/$ENV/database-secret.patch.yaml



apiVersion: v1
kind: Secret
metadata:
  name: metabase
type: Opaque
data:
  password: <MB_DB_PASS>


Enter fullscreen mode Exit fullscreen mode

Create config/envs/$ENV/deployment.patch.yaml



apiVersion: apps/v1
kind: Deployment
metadata:
  name: metabase
  labels:
    app: metabase
spec:
  selector:
    matchLabels:
      app: metabase
  replicas: 1
  template:
    metadata:
      labels:
        app: metabase
    spec:
      serviceAccountName: metabase
      containers:
        - name: metabase
          image: metabase/metabase
          imagePullPolicy: IfNotPresent
          env:
          - name: MB_DB_TYPE
            value: postgres
          - name: MB_DB_HOST
            value: <MB_DB_HOST>
          - name: MB_DB_PORT
            value: "5432"
          - name: MB_DB_DBNAME
            value: metabase
          - name: MB_DB_USER
            value: metabase
          - name: MB_DB_PASS
            valueFrom:
              secretKeyRef:
                name: metabase
                key: password
      nodeSelector:
          type: private


Enter fullscreen mode Exit fullscreen mode

And the config/envs/$ENV/kustomization.yaml file



apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: metabase

resources:
- ../../base

patchesStrategicMerge:
- security-group-policy.patch.yaml
- service-account.patch.yaml
- database-secret.patch.yaml
- deployment.patch.yaml


Enter fullscreen mode Exit fullscreen mode

Let's replace the by real values:



cd config/envs/dev
# Generate DB auth token
METABASE_PWD=$(aws rds generate-db-auth-token --hostname $(terraform output private-rds-endpoint) --port 5432 --username metabase --region $REGION)
METABASE_PWD=$(echo -n $METABASE_PWD | base64 -w 0 )
sed -i "s/<MB_DB_PASS>/$METABASE_PWD/g" database-secret.patch.yaml
sed -i "s/<POD_SECURITY_GROUP_ID>/$(terraform output sg-rds-access)/g; s/<EKS_CLUSTER_SECURITY_GROUP_ID>/$(terraform output sg-eks-cluster)/g" security-group-policy.patch.yaml
sed -i "s,<RDS_ACCESS_ROLE_ARN>,$(terraform output rds-access-role-arn),g" service-account.patch.yaml
sed -i "s/<MB_DB_HOST>/$(terraform output private-rds-endpoint)/g" deployment.patch.yaml


Enter fullscreen mode Exit fullscreen mode

Run the manifests



kubectl create namespace metabase
kubectl config set-context --current --namespace=metabase
kustomize build . | kubectl apply -f -


Enter fullscreen mode Exit fullscreen mode

Let's see if it worked



$ kubectl get pods

NAME                        READY   STATUS    RESTARTS   AGE
metabase-6d47d7b94b-796sx   1/1     Running   2          98s


Enter fullscreen mode Exit fullscreen mode


$ kubectl describe pods metabase-6d47d7b94b-796sx

Name:         metabase-6d47d7b94b-796sx
Namespace:    metabase
Priority:     0
Node:         ip-10-0-3-109.eu-west-1.compute.internal/10.0.3.109
[..]
Labels:       app=metabase
              pod-template-hash=6d47d7b94b
Annotations:  kubernetes.io/psp: eks.privileged
              vpc.amazonaws.com/pod-eni:
                [{"eniId":"eni-054df22ad2b1b89c3","ifAddress":"02:3b:a8:a7:9c:f5","privateIp":"10.0.3.128","vlanId":1,"subnetCidr":"10.0.2.0/23"}]
Status:       Running
IP:           10.0.3.128
IPs:
  IP:           10.0.3.128
[..]
Node-Selectors:  type=private
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason                  Age   From                     Message
  ----    ------                  ----  ----                     -------
  Normal  Scheduled               32s   default-scheduler        Successfully assigned metabase/metabase-6d47d7b94b-796sx to ip-10-0-3-109.eu-west-1.compute.internal
  Normal  SecurityGroupRequested  32s   vpc-resource-controller  Pod will get the following Security Groups [sg-0c0195a69b1b8bdc3 sg-0d4b509bad15ec963]
  Normal  ResourceAllocated       31s   vpc-resource-controller  Allocated [{"eniId":"eni-054df22ad2b1b89c3","ifAddress":"02:3b:a8:a7:9c:f5","privateIp":"10.0.3.128","vlanId":1,"subnetCidr":"10.0.2.0/23"}] to the pod
  Normal  Pulled                  31s   kubelet                  Container image "metabase/metabase" already present on machine
  Normal  Created                 31s   kubelet                  Created container metabase
  Normal  Started                 31s   kubelet                  Started container metabase


Enter fullscreen mode Exit fullscreen mode

As we can see the security groups have been attached to the pods.



$ kubectl logs metabase-6d47d7b94b-796sx

[..]
2021-03-20 13:22:35,660 INFO metabase.core :: Setting up and migrating Metabase DB. Please sit tight, this may take a minute...
2021-03-20 13:22:35,663 INFO db.setup :: Verifying postgres Database Connection ...
2021-03-20 13:22:40,245 INFO db.setup :: Successfully verified PostgreSQL 12.5 application database connection. βœ…
2021-03-20 13:22:40,246 INFO db.setup :: Running Database Migrations...
2021-03-20 13:22:40,387 INFO db.setup :: Setting up Liquibase...
2021-03-20 13:22:40,502 INFO db.setup :: Liquibase is ready.
2021-03-20 13:22:40,503 INFO db.liquibase :: Checking if Database has unrun migrations...
2021-03-20 13:22:42,900 INFO db.liquibase :: Database has unrun migrations. Waiting for migration lock to be cleared...
2021-03-20 13:22:42,980 INFO db.liquibase :: Migration lock is cleared. Running migrations...
2021-03-20 13:22:48,068 INFO db.setup :: Database Migrations Current ...  βœ…
[..]
2021-03-20 13:23:13,054 INFO metabase.core :: Metabase Initialization COMPLETE


Enter fullscreen mode Exit fullscreen mode

If the deployment is created before the SecurityGroupPolicy you will get a connect timed out. Delete and recreate the deployment.

Now, let's delete the security groups policy and recreate the deployment to check if the connection fails.



$ kubectl delete -f security-group-policy.patch.yaml
$ kubectl delete -f deployment.patch.yaml
$ kubectl apply -f deployment.patch.yaml

$ kubectl logs metabase-6d47d7b94b-wbn4r

2021-03-20 13:31:32,993 INFO db.setup :: Verifying postgres Database Connection ...
2021-03-20 13:31:43,052 ERROR metabase.core :: Metabase Initialization FAILED
clojure.lang.ExceptionInfo: Unable to connect to Metabase postgres DB.
[..]
Caused by: java.net.SocketTimeoutException: connect timed out
[..]
2021-03-20 13:31:43,072 INFO metabase.core :: Metabase Shutting Down ...
2021-03-20 13:31:43,077 INFO metabase.server :: Shutting Down Embedded Jetty Webserver
2021-03-20 13:31:43,088 INFO metabase.core :: Metabase Shutdown COMPLETE


Enter fullscreen mode Exit fullscreen mode

As you can see, metabase is no longer authorized to access the RDS instance.

Last check, let's add Security Group Policy again and remove the annotation from the service account that attaches the IAM role to the pod.



$ kubectl annotate sa metabase eks.amazonaws.com/role-arn-
$ kubectl apply -f security-group-policy.patch.yaml
$ kubectl delete -f deployment.patch.yaml
$ kubectl apply -f deployment.patch.yaml

2021-03-20 13:43:42,329 INFO db.setup :: Verifying postgres Database Connection ...
2021-03-20 13:43:42,710 ERROR metabase.core :: Metabase Initialization FAILED
clojure.lang.ExceptionInfo: Unable to connect to Metabase postgres DB.
[..]
Caused by: org.postgresql.util.PSQLException: FATAL: PAM authentication failed for user "metabase"
[..]


Enter fullscreen mode Exit fullscreen mode

As you can see, metabase is no longer authenticated and then authorized to access the user "metabase".

Conclusion

In this long workshop, we created:

  • An isolated network to host our Amazon RDS
  • Configured an Amazon EKS cluster with fine-grained access control to Amazon RDS
  • We tested the connectivity between a Kubernetes container and an RDS instance database.

That's it!

Clean



kustomize build . | kubectl delete -f -
cd ../../../infra/envs/$ENV
terraform destroy ../../plan/

Enter fullscreen mode Exit fullscreen mode




Final Words

The source code is available on Gitlab.

If you have any questions or feedback, please feel free to leave a comment.

Otherwise, I hope I have helped you answer some of the hard questions about connecting Amazon EKS to Amazon RDS and providing a pod level defense in depth security strategy at both the networking and authentication layers.

By the way, do not hesitate to share with peers 😊

Thanks for reading!

Documentation

[1] https://docs.aws.amazon.com/eks/latest/userguide/cni-upgrades.html
[2] https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html
[3] https://eksctl.io/usage/iamserviceaccounts/#how-it-works

Top comments (1)

Collapse
 
yi2020 profile image
Yoni Leitersdorf

Hi Chabane,

Great series of articles! I was curious what results I'd get if I ran Cloudrail against your TF code, and looks like everything checks out!